Sunday, March 16, 2014

Stopping the madness: A better way to score NCAA tournament brackets

Bracket season is upon us once again, which means that millions of normally hardworking Americans will spend the next week carefully examining an ordered list of 64 basketball teams and trying to select the winners of the 63 games those teams will play. (Fine, it’s really 68 and 67, but let’s not pretend that most people care overmuch about the opening-round games, especially because most bracket pools ignore them.) Those picks will, for the most part, be entered into online contests which will compare the predicted brackets to the results of the games as they are played, and select the best one as the winner. This should come as news to exactly nobody.

But for all the care put into the individual brackets by the participants, the people running the contests generally put shockingly little thought into the actual method used to evaluate them.

The most popular bracket challenge is run by ESPN.com, which uses the following scoring system:

Round
Points
1
10
2
20
3
40
4
80
5
160
6
320

This is simple enough; the number of points double with each round. The simplicity is appealing in and of itself, and a variation on this method is also used by Yahoo and CBS Sports, among others. It also has a simple mathematical basis – there are two teams competing for each second-round spot, four competing for each Sweet Sixteen berth, eight for every slot in a regional final, and so on. Your score for correctly selecting the team on a given line of the bracket is directly proportional to the number of options you could put there.

But a cursory examination of the actual brackets reveals the unsatisfactory nature of this approach. Take the simplest example: If you pick a #1 seed to beat their 16th-seeded opponent in the first round (no 1 seed has ever lost at this stage), you get 10 points. On the other hand, if you successfully predict the biggest upset of the first day, say a #15 beating a #2 (which happens about once every four years), you get… 10 points. This, despite the fact that the second prediction is vastly more unlikely than the first.

This problem becomes exacerbated if you examine the bracket with a wider scope. Correctly selecting the national champion awards you with 320 points. Over the 29 years of the 64-team format, the tournament has been won by a top-3 seed 26 times, with 18 of those champions being #1 seeds. Barring an upset of rather monumental proportions (for instance, #8 seed Villanova winning the title in 1985), someone in a pool of reasonable size is probably going to pick the winner.

Meanwhile, the first round of the bracket is often famously chaos-laden, making it highly difficult to nail down. ESPN’s bracket challenge features millions of entrants, and in several years of participation, I have never seen anyone come especially close to correctly picking the entire first round; the highest number of correct predictions in the tournament’s first two days is usually in the mid-to-high 20’s out of 32. If, however, someone were to manage the titanic feat of exactly picking every 5-12 upset without picking any that didn’t happen, nailing each of the coinflipesque 8-9 and 7-10 games, and somehow anticipating the win or two by seeds 13 or higher that seem to slip in every year, this astonishing achievement in prognostication would be rewarded with… 320 points, the same number that would go to any of the thousands of people who picked the right top seed as the last team standing. And that understates things, because it doesn’t count the points those who correctly pick the champ get for picking them to win their preceding five games, which add up to a further 310. If every-first-round-game guy doesn’t nail the last one as well, he’s still almost certain to lose the challenge despite his once-in-a-thousand-years feat.

Surely there’s something that can be done about this, right?

This post wouldn’t be worth much if the answer were “no.” What can be done is to take a look at the scoring system and what it should ideally represent, which will hopefully result in reasonable rewards for the various possible picks. I would define “reasonable rewards” based on the proposition that any pick you make should have the same expected payout over an infinitely large number of brackets. The way to manage this is to simply set the following relation:

Points awarded for correct pick = 1/(Odds of the pick being correct)

So, since #1 seeds have historically won all of their games against 16s, the odds of that pick being correct are either 100% or very close to it, and the reward for picking the 1 seed is 1 point, or very close to it. Meanwhile, 15 seeds have a record of 7-109 against #2’s, and thus successfully predicting such an upset would be expected to be worth 15.6 points (109/7), give or take.

However, it’s not possible to run these numbers using simple historical performance by the various seeds, because there are several possible events that have never happened, and would therefore be expected to have historical odds of 0 (thus resulting in infinite points for someone who successfully picks the first one). For instance, before last year’s tournament, no 15 seed had ever reached the Sweet Sixteen, and no 9 seed had made the Final 4 in the 64-team era. It famously remains the case that no 16 seed has ever upset a #1; it is also true that no team seeded 13th or lower has made the Elite Eight, and no 7 or 10 seed has made the Final Four (although four 8’s, one 9, and three 11’s have done so). All of these things will happen eventually, given time; it’s just that they haven’t happened in the 29 years of 64-team brackets to date.

So if not through direct historical performance, how can we estimate the odds of a 16-seed beating a 1 seed – or, for that matter, a 16-seed making the Final Four or winning the title? The answer I’m using comes in two parts. The first is Pythagorean winning percentage, a tool originally created by Bill James for use in baseball. The idea is to estimate a team’s winning percentage based on runs scored and allowed (or, in basketball, points). The initial formula in baseball was:

Winning percentage = RS2 / (RS2 + RA2)

Later modifications have changed the value of the exponent and made it dependent on the scoring level in the team’s games, neither of which is relevant here. In basketball, the Pythagorean exponent is 13.9, instead of 2.

I calculated the Pythagorean winning percentage for each seed across every matchup that has occurred in the last 29 years of NCAA tournaments. (I tried just lumping all the games together before calculating Pythagorean winning percentage, but the early matchups skewed the numbers too much – 1 seeds generally hammer 16s by such huge margins that their overall win totals were vastly overstated. Going matchup-by-matchup resolved this issue.) Using the historical points scored and allowed numbers, here are the expected winning percentages for the higher seed across each first-round matchup:

Matchup
Actual W%
Pythag W%
1 vs. 16
1.000
.992
2 vs. 15
.940
.960
3 vs. 14
.853
.911
4 vs. 13
.784
.868
5 vs. 12
.647
.712
6 vs. 11
.664
.676
7 vs. 10
.603
.628
8 vs. 9
.483
.501

The correspondence is respectable, if imperfect; in particular, the relatively common upsets are under-predicted. This is likely caused by the fact that not all 14 seeds, for instance, are the same; the variation in their quality should make upsets more likely than would otherwise be expected.

Using Pythagorean wins provides a way to come up with non-zero (if still minute) odds for upsets that have not yet occurred, but it still leaves no option for projecting matchups that have never arisen. No 16 seed has ever faced a non-1-seeded foe. 15s have a few more data points, with opponents including 2, 3, 7, and 10. There are gaps in every seed’s historical opponent list, all the way to the top (no 1 seed has ever squared off with a 14 or 15). Since we want a way to predict the odds of a 16 seed making the Final Four, we also need a way of calculating their chances against the 8 or 9 they would face in the second round, and the teams that would be waiting beyond that.

The second part of the answer comes in the form of a tool I’ve used previously for tennis – the multiplicative Elo rating, which takes each seed’s performance against its schedule of opposing seeds and generates a strength rating, which is used to estimate winning percentage as follows:

Winning percentage = (Rating of team A) / (Rating of team A + Rating of team B)

The ratings are adjusted iteratively until they match the historical Pythagorean win totals across seed matchups. In case anyone is curious, they are:

Seed
Rating
1
36.60
2
21.56
3
16.26
4
11.91
5
10.63
6
9.27
7
8.20
8
5.81
9
5.33
10
5.48
11
4.65
12
5.01
13
1.83
14
1.64
15
0.85
16
0.29

The ratings decrease steadily, with a few slight exceptions (10s score marginally higher than 9s, and 12s than 11s). More notable is the fact that the sharp drop-offs are in sensible places; for instance, the 8-through-12 seeds are bunched together (as the bottom tier of at-large teams), and there’s a big dip down to the 13 seeds (the automatic qualifiers, who would not be in the field had they not won their conference tournaments). The #1 seeds also tower impressively over all of their competition, which does not come as an Earth-shaking surprise.

Let’s repeat the earlier table of first-round matchups with the Elo-projected winning percentages added in:

Matchup
Actual W%
Pythag W%
Elo W%
1 vs. 16
1.000
.992
.992
2 vs. 15
.940
.960
.962
3 vs. 14
.853
.911
.909
4 vs. 13
.784
.868
.867
5 vs. 12
.647
.712
.680
6 vs. 11
.664
.676
.666
7 vs. 10
.603
.628
.599
8 vs. 9
.483
.501
.522

Elo is still overly optimistic about 3 and 4 seeds in the first round, but it has a more realistic picture of 5’s and 7’s. More to the point, it also offers a way to extend projections beyond the first round. Here are the projected odds of a team of each seed reaching each round (or further), given the actual potential matchups they would face and the likelihoods thereof:

Seed
Round of 32
Sweet 16
Elite 8
Final 4
Title game
Champ
1
0.992
0.861
0.672
0.466
0.283
0.164
2
0.962
0.725
0.460
0.218
0.106
0.048
3
0.909
0.621
0.313
0.129
0.054
0.021
4
0.867
0.507
0.155
0.067
0.024
0.008
5
0.680
0.355
0.101
0.041
0.013
0.004
6
0.666
0.272
0.102
0.030
0.009
0.002
7
0.599
0.179
0.073
0.020
0.006
0.001
8
0.522
0.075
0.028
0.008
0.002
0.000
9
0.478
0.064
0.023
0.006
0.001
0.000
10
0.401
0.091
0.029
0.006
0.001
0.000
11
0.334
0.090
0.022
0.004
0.001
0.000
12
0.320
0.113
0.019
0.005
0.001
0.000
13
0.133
0.025
0.002
0.000
0.000
0.000
14
0.091
0.017
0.002
0.000
0.000
0.000
15
0.038
0.004
0.000
0.000
0.000
0.000
16
0.008
0.000
0.000
0.000
0.000
0.000

The 0.000’s in the table are, of course, not actually 0; they’re simply “less than 0.0005.” The smallest of them is naturally the chance of a 16 seed winning the title, which is estimated at 4.58 * 10-11. Not great odds for the 16s, which makes sense.

The next table will be the inverse of the odds, which also makes it the number of points awarded for correctly picking a team of that seed to reach the round in question:

Seed
Round of 32
Sweet 16
Elite 8
Final 4
Title game
Champ
1
1.01
1.16
1.49
2.15
3.53
6.09
2
1.04
1.38
2.18
4.59
9.46
20.78
3
1.10
1.61
3.20
7.78
18.54
47.56
4
1.15
1.97
6.44
14.86
42.40
131.55
5
1.47
2.82
9.88
24.27
74.30
248.42
6
1.50
3.68
9.83
33.17
110.88
406.80
7
1.67
5.57
13.67
49.95
181.29
725.39
8
1.92
13.36
35.97
129.33
599.61
3100.04
9
2.09
15.64
44.40
169.56
838.56
4637.28
10
2.50
10.95
34.29
165.42
801.35
4337.06
11
2.99
11.10
46.41
252.70
1387.61
8555.68
12
3.12
8.81
51.94
207.40
1075.42
6247.66
13
7.50
40.39
533.83
4751.24
5.69*104
7.84*104
14
10.94
58.53
558.72
7159.29
9.49*104
1.45*106
15
26.23
237.05
3211.98
7.41*104
1.79*106
5.01*107
16
126.69
2551.39
8.18*104
4.03*106
2.74*108
2.18*1010

Those are single-line totals, not cumulative (that is to say, accurately predicting a 7 seed in the final four would get you the sum of the first four columns in row seven, because you'd pick them in each of their three prior matchups as well). As a quick example of how the system works, let’s use it on last year’s bracket.

Louisville, the top overall seed, won the national title. ESPN’s scoring system would award 630 total points for that selection. Michigan, a 4 seed, made the title game, resulting in 310 points for a correct pick. Wichita State became the first #9 seed to make the Final Four in the 64-team era; 150 points for that one if you got it. And Florida Gulf Coast became the first-ever 15 seed in the Sweet 16, which is worth all of 30 points for a correct selection. So even if you had somehow picked Michigan, Wichita State, and FGCU to reach the points in the bracket that they actually did, which would be quite impressive, if you happened to have Louisville going out early, you still didn’t necessarily fare well in your bracket pool.

By comparison, the method proposed here awards 15.42 total points for correctly picking a 1 seed as the champ. It offers 66.83 for putting the right 4 seed in the title game, 231.69 for a 9 seed in the Final 4, and 263.28 for a 15 in the third round. And if someone had completely aced the first round, they'd have started day 3 of the tournament with 92.43 points, more than would have been available from predicting Louisville over Michigan in the final. In this method, accurately selecting a top-seeded titlist is beneficial, but if someone else in your pool nails this year’s Cinderella team, they’re still likely to beat you. Which makes sense, because accurately picking Wichita State in the Final 4 last year would have been vastly more impressive (or at least unusual) than listing Louisville as the champion, and a good scoring system for a bracket should reflect that fact.

Since the points awarded for each pick are intended as the inverse of the odds of the pick, one would hope to have an expected return of N points for picking any team to win N games in the tournament. Our final table will be of the historic average return produced under this system by picking a team of a given seed to win the national title (which ideally should be 6, since a title requires 6 wins):

Seed
Avg points
1
5.68
2
5.57
3
6.62
4
6.67
5
5.76
6
9.65
7
2.79
8
45.77
9
3.98
10
4.95
11
10.88
12
3.07
13
3.71
14
2.61
15
3.63
16
0

With a couple of obvious exceptions (#8 seed Villanova’s 1985 title producing the most obvious among them), that comes very close to achieving the system’s stated goal. In particular, seeds 1 through 5, which have produced all but four of the last 58 title game participants, all give expected returns between 5.5 and 6.75, which is… rather satisfyingly close to 6.

And there you have it - an alternate method of scoring brackets that properly rewards a correct surprise pick, and thereby provides adequate inducement for the prediction of upsets (or disincentive to picking favorites throughout).

No comments:

Post a Comment