AI & Technology6 min read

How We Grade Every Prediction — The Transparency Behind Our Accuracy

By Predictify Sports Team·April 16, 2026·6 min

Most prediction sites hide their misses. They'll post a screenshot of a winning streak and quietly delete the losing week. We grade every single pick we publish — including the ones that embarrass us. Our La Liga accuracy is 25%. Our model predicted Arsenal to beat Sporting CP at 75% confidence, and it drew 0-0. We picked Real Madrid at 88% and they lost at Mallorca. AC Milan at 88% and they got demolished 0-3 by Udinese. All of that is public, graded, and permanently recorded.

This is how the grading system works, why the mechanics matter, and what you should look for when evaluating any prediction service — including ours.

Why Grading Matters

Without grading, prediction accuracy is just a marketing claim. A site that says “82% accuracy” in a headline but doesn't publish every prediction with its outcome is giving you a number you can't verify. Maybe they cherry-picked their best month. Maybe they excluded draws. Maybe the denominator is conveniently small.

Our approach is different: every prediction gets a public result. When we went 1-for-4 on Champions League quarterfinal first legs — hitting Bayern over Real Madrid at 68% but missing the other three — that record is visible to every user. When our Bundesliga accuracy sits at 38.9% because we can't predict draws, that number is on the accuracy page for anyone to see.

Transparency isn't comfortable. Publishing a 25% accuracy rate in La Liga isn't good marketing. But it's honest, and honesty is the only foundation that lets you actually calibrate how much to trust the model in each sport.

What Gets Graded and When

Every prediction published on the platform gets graded automatically once the match finishes. There is no manual curation, no selection of which picks to grade, and no way to retroactively remove a prediction that missed.

Match result grading happens within hours of the final whistle. For soccer, the system checks whether the predicted outcome (home win, away win, or draw) matched the actual result. For basketball, hockey, and baseball, it checks whether the predicted winner was correct. For combat sports (UFC, boxing), it checks whether the predicted fighter won.

Tennis grading uses sets won rather than games. If the model predicted Player A to win and Player A won 2 sets to 1, that's a hit. Sets are the meaningful unit in tennis, and using them avoids the noise of individual game scores that can vary wildly.

Timing matters. Soccer matches typically get graded within 1-2 hours of finishing. For sports that use scoreboard data from third-party feeds, there can be a slightly longer delay — but the grading pipeline runs regularly throughout the day to catch everything. No match falls through the cracks permanently; there's a safety net that catches any ungraded matches within 6 hours.

The Automated Grading Pipeline

The grading process is fully automated. Here's the flow at a high level, without getting into technical specifics (for that, see our How It Works page).

Step 1: The system monitors match status. Once a match is marked as finished in our sports data feeds, it becomes eligible for grading.

Step 2: Final scores are pulled and stored. For team sports, this is the final score. For tennis, it's sets won. For combat sports, it's the winner (and method, when available).

Step 3: The stored prediction is compared against the actual result. The system records whether the prediction was correct (hit) or incorrect (miss), along with the original confidence score.

Step 4: The result is published immediately. There's no review queue, no approval step, no opportunity to filter out embarrassing misses. The graded prediction appears on the predictions page and feeds into the accuracy calculations on the accuracy page.

This automation is the point. If grading required a human to press a button, there would always be a temptation — conscious or unconscious — to delay grading a bad week or to “accidentally” skip a particularly ugly miss. Automation removes that temptation entirely.

Edge Cases We Handle

Sports don't always produce clean outcomes. Here's how the system handles the messy ones.

Cancelled and postponed matches have their predictions voided. If a match is postponed due to weather or other circumstances, the prediction is removed from the graded pool entirely — it doesn't count as a hit or a miss. This prevents the accuracy numbers from being inflated or deflated by events outside the model's control.

Tennis retirements and walkovers are graded based on the result at the time of stoppage. If the model predicted Player A and Player A was winning when the opponent retired, that counts as a hit. If the leading player retired, the grading follows the official result.

Boxing and UFC draws or no-contests are treated as misses for both fighters if neither was predicted. If the model predicted a draw and the fight ended in a draw, that's a hit. These scenarios are rare but the system handles them consistently.

Draw predictions in soccer that end in narrow wins (or vice versa) are graded strictly. If the model predicted a draw and the match ended 1-0, that's a miss — period. No partial credit. This strict grading is why our La Liga and Bundesliga accuracy numbers look rough: the model predicted draws that ended in narrow wins, and each one counted as a full miss.

How to Read the Accuracy Page

The accuracy page is where all of this comes together. Here's what each section shows.

Overall accuracy is the total hit rate across all graded predictions in all sports. This is the headline number, but it's also the least useful because it blends high-performing sports (boxing at 88.1%, NBA at 70.6%) with lower-performing ones (La Liga at 38.5%). Always drill into sport-specific accuracy for a meaningful picture.

Sport-by-sport breakdown shows accuracy for each sport independently. This is where you should focus. If you mainly bet on MLB, the overall accuracy number is irrelevant — what matters is the MLB-specific accuracy (currently 58.2% across 146 graded picks).

League-level accuracy (for soccer) breaks it down further. Serie A at 60.5% and La Liga at 38.5% are both “soccer,” but they're completely different prediction environments. The league-level view is the most honest representation of model performance.

You can also browse every individual prediction and its result on the predictions page — every match, every confidence score, every outcome, in chronological order. For more on what confidence scores mean, read How to Read AI Confidence Scores.

What We've Learned from Grading Data

The grading system isn't just for users — it's how we identify where the model needs improvement.

Draw calibration is a noticeable weakness in La Liga. The model predicts draws that end in narrow wins and predicts home wins that end in draws. This specific failure mode accounts for a meaningful share of misses in La Liga. The grading data makes the pattern visible and actionable.

High-confidence picks are well-calibrated. Picks at 85%+ have landed at a rate consistent with their confidence scores across most sports. The model's struggles are concentrated in the 60-75% range — the “slight lean” tier where uncertainty is highest. The grading data confirms that trusting high-confidence picks and being cautious with mid-range ones is the right approach.

Some leagues remain harder than others. La Liga (38.5%) sits below our soccer-overall average due to upset frequency. The Bundesliga has improved meaningfully from prior samples — at 52.8% it is now one of our better soccer leagues — suggesting calibration progress is possible even in leagues that initially looked structurally hostile.

The Commitment

Every prediction graded. Every miss reported. Every accuracy claim backed by data you can verify yourself.

That's not a tagline — it's a mechanical guarantee built into how the platform works. The grading pipeline runs automatically, the results are published without human review, and the accuracy page updates in real time. If our La Liga accuracy drops to 20%, you'll see it. If our MLB accuracy climbs to 65%, you'll see that too.

We believe transparency is the only credible foundation for an AI prediction service. If you can't verify the claims, the claims are worthless. Everything we publish is verifiable. Hold us to it.

AI & Technology

See today's free picks with confidence scores.

See Today's Picks →

How We Grade Every Prediction — The Transparency Behind Our Accuracy

Why Grading Matters

What Gets Graded and When

The Automated Grading Pipeline

Edge Cases We Handle

How to Read the Accuracy Page

What We've Learned from Grading Data

The Commitment

Related Articles

How to Read AI Confidence Scores — What 75% Actually Means

AI Tennis Predictions — How Our Model Picks ATP & WTA Winners

UFC 327: Breaking Down Our AI’s Picks