My simulation matches up the teams that play in the NCAA bracket and uses one of the schmes below to generate a probability for a Monte Carlo simulation of games between the teams.
Probability scheme 1: Sagarin ratings only
The first simply uses the Sagarin ratings to create a probability of the team 1 winning. Probability = team 1 Sagarin /( Team 1 Sagarin + Team 2 Sagarin). I use the Predictor Sagarin Rating because that is what he suggests for predicting the score and outcome of a game. A random number from 0 to 1 which is less than the probability above means that team 1 wins, otherwise its team 2.

I calculated every team's probability of winning vs every other team and then plotted this vs the difference in seeds. A -15 means a 1 seed played a 16 seed. This scheme results in probabilities that only vary from 58% to about 50% for matchups between seeds with up to 15 difference to even. Unfortunately no 16 seed team has even beaten a number 1 seed so this scheme leave the games too evenly matched and does not reflect the history of outcomes in the tournament.


Probability scheme 2: Seed difference and tournament history only
Another approach is to use the seeds of the team in the tournament. With 25 years or so of data I captured the number of times a favorite beat an underdog based on the seed difference. For instance, never has a 16 seed beaten a 1 seed, while 8 vs. 9 seeds are almost 50/50. I use the data from 25 years of round of 64, round of 32 and round of 16 and then fit a line assuming that even seeds are 50/50 and that a seed difference of 15 (1 vs. 16) will result in a favorite win 99.07% of the time. That represents 1 in 108, though this upset has never occurred in 26 years of data, it will happen someday, and that could be as soon as 1 this year. Thus (26*4+3) wins/(27*4) attempts is 99.07%.



Probability scheme 3: Sagarin ratings scaled by seed difference and tournament history
The final approach combines the two by scaling the average of the Sagarin ratings probability by the expected probability due to seeds as predicted by historical performance. Thus we make sure the average for teams. In practice I add the residuals of the line fitted through the Sagarin rating probabilities to the line fitted by setting the 15 difference probability to 99.07% and the even difference to 50%.



0 comments:
Post a Comment