Seth WalderESPN Analytics
I’m going to let you in on a secret of the probabilistic prediction biz: We get to have it both ways.
Let’s say FPI believes a team has a 72 percent chance to win a game, much like it did for Auburn against UCF in the Chick-Fil-A Peach Bowl. If Auburn had won, no one would have much doubted FPI’s prediction powers: It said the Tigers probably would win, and they did. But when the Knights won, we were able to just shrug our shoulders and say, “Hey, things that are supposed to happen 28 percent of the time do sometimes happen. Twenty-eight percent of the time, in fact.”
See? We win no matter what.
OK, that’s not actually how it works. In reality, FPI wouldn’t take credit — or blame — for any single game’s outcome. That just isn’t (close to) enough of a sample to know if it’s actually doing its job.
But at the same time, ESPN’s analytics team feels strongly that if we’re going to make predictions, we sure better evaluate how those predictions actually fare. And so we do that — with a big assist from the folks at ThePredictionTracker.com.
We want to judge the college FPI model on how it fared predicting all college football games this season. And the way we want to measure that success is by considering its “error” — the difference between the predicted point margin and the actual resulting point margin. We mainly are concerned with two different scores: mean absolute error, and mean squared error (which punishes bigger misses more heavily).
And while FPI doesn’t want to brag, when we look at those prediction metrics, we see that it had a fantastic year in 2017. FPI’s average absolute error was 12.4, so the average margin of victory was 12.4 points different than our predicted margin of error. That might sound like a lot (a lower error is better), but when predicting 780 football games, that’s actually very good. And it’s all relative to other models. In both error categories, FPI had a lower (better) score than any other public model out there. Here, see for yourself!
In addition, in average squared error FPI also beat the Vegas opening line. Over the long haul, the Vegas lines are essentially a theoretical limit for a public model. If a public model were consistently beating Vegas, it would be used by bettors or sports books and the line would shift toward the model.
Here’s another way to verify that FPI is doing its job: calibration.
When FPI says a team has a 72 percent chance to win, like in the example above, we need to make sure that’s about how often that team actually is winning. If teams given a 72 percent chance to win are winning 95 percent of the time, then we have a problem.
If we group together teams by similar predicted win percentages, we can see how they fared as a set compared to their prediction.
All of prediction groupings fell in their expected range except for the 61- to 70-percent set.
Keep in mind, this uses arbitrary buckets and significantly reduces the sample of the season. Therefore, it is less important than the aforementioned error analysis above but is useful for a quick peek. And if we look at multiple years of calibration, the results end up much closer to the middle of those buckets.
Looking for a more concrete prediction from FPI? While it really has no mathematical bearing on the model’s accuracy, you can revisit our season win total predictions for five teams the model felt strongly about (relative to Vegas) in the preseason. Spoiler alert: All FPI needed to go 5-0 was for Syracuse to win a single game after beating Clemson. Alas…