Wednesday, January 29, 2014

Ranking tennis players: Method (part 2)

Having previously decided on service games won and lost with no surface adjustments in assessing an individual tennis match, it remains now to take the step from judging a single match to combining the results of over 7000 matches (an approximate total for a full ATP World Tour, ATP Challenger Tour, and Davis Cup World Group season) into an overarching rating.

The first step is simple enough: add up each player’s totals for service games won and lost in the year in question. For instance, in 2011, Novak Djokovic won 1125 service games and lost 667. (I’ll use 2011 data for all my examples here because it’s the year in which my rankings best match the ATP’s, which allow for easier focus on the method itself rather than differences in the outcomes.) That works out to a .628 winning percentage, which happens to be unbelievably good (as were many things about Djokovic’s 2011 season). You do the same for the other 835 players who made at least one appearance in one of the events considered, from world #2 Rafael Nadal down to the three players who played in only one match and lost it 6-0, 6-0.

The raw records themselves can be used for unadjusted rankings; the one I typically use for at-a-glance work when I’m not ready to undertake the draw adjustment is number of binomial standard deviations above average. This is calculated as follows:

  (Games won – Games lost)     
(Square root of games played)

The top 10 in those unadjusted rankings for 2011 are:

Player
StDev
Novak Djokovic
10.8
Rafael Nadal
8.9
Roger Federer
8.0
David Ferrer
7.2
Andy Murray
6.8
Tomas Berdych
5.6
Juan Martin del Potro
5.3
Janko Tipsarevic
4.9
Robin Soderling
4.6
Wayne Odesnik
4.3

Most of those are names that are pretty familiar to even casual tennis fans – they’re either the guys who keep winning Grand Slams, or the guys who keep losing to the guys who keep winning Grand Slams. The most obvious exception comes in at the #10 spot, in the person of Wayne Odesnik. If you have never heard Odesnik’s name before, you can be forgiven for the oversight; the ATP rankings placed him in 129th at the end of the 2011 season, and have never listed him higher than 77th. So what is he doing in the #10 spot here?

Odesnik spent most of 2011 on the Challenger Tour, playing (and frequently beating) much weaker opposition than the top 9 players on the list. His appearance leads us to the second part of the ranking: the draw adjustment. The most basic form of this adjustment would simply involve shifting the player’s game winning percentage up or down by looking at the average quality of his opponents. But in a sport like tennis, in which the quality of opposition varies so wildly from match to match, this may not be the best option.

Consider the following hypothetical: A player is in the semifinals of a tournament, meaning that he is two matches away from winning the title. Assume that the average quality of his opponent in those two matches (if he plays both of them) is that of a player somewhere in the middle of the top 100. Would the player be better off with two opponents ranked in the 40-60 range, or would he be better off facing one player outside the top 200 followed by a top-5 foe in the final?

The answer to that question depends very heavily on the identity of the player himself. If the player is one of the best in the world, he would be unperturbed by the prospect of facing two 40-60 players, and would likely win both matches with relative ease. On the other hand, a top-5 matchup in the final would hold significant potential for defeat. If the player in question is somewhere outside the top 100, however, he would likely obtain a preferable outcome in the second scenario – he has a good chance of beating the lower-ranked foe, which at least puts him in the final, whereas he’d be likely to lose in the semi against a better player.

In other words, difficulty of draw is best evaluated within the context of the abilities of the player himself. There is a rating that handles this very well. It is introduced (albeit using football rather than tennis) in this article, in which it is referred to as an Elo rating (although I’m not certain that the rating described is actually an Elo rating, though they are closely related; the Elo rating for chess is described here). I am using a modified version of the rating system laid out in that post. The method starts with the calculation of the player’s expected winning percentage against each of his opponents, and adjusts every player’s ratings until their expected win totals match those they actually recorded.

Instead of the exponential function used in the source article for calculating win probabilities, my choice of formula was simply:

Service game winning percentage (for player A against player B) = A / (A+B)

Where A and B are the ratings of players A and B, respectively. It is still calculated iteratively, and instead of the changes in the ratings being additive and the ratings centering around 0, the changes are multiplicative and the ratings are all positive and centered around 1. Or, putting it in slightly plainer language: Every player is assigned a rating of 1.0 at the beginning of the calculations, everyone’s expected wins are figured with those ratings, and everyone’s rating is then multiplied by (actual service games won / expected service games won). Expected games won are then recalculated with the new ratings, and the ratings are adjusted again. This process is repeated many, many times until actual wins and expected wins are equal.

(Side note: The formula I’m using converges much, much faster than the exponential version laid out in the post linked above, requiring a few hundred iterations rather than many thousands. The projected winning percentages are identical, because the formulas are mathematically equivalent. I wish I had figured this out much earlier than I actually did, or on purpose.)

There are some issues with this style of ranking, which are described in the post linked above and the posts it links to in turn. They boil down to the fact that it skews the ratings of unbeaten players (which isn’t too bad, because tennis players don’t generally go unbeaten in matches, let alone service games) and winless players (uh oh).

As mentioned earlier, there were three players in 2011 who went 0-12 in service games. If you apply one iteration of the formula to them, you get Rating (1) = 1.0* (Actual wins / Expected wins). Since actual wins are 0, this gives a rating of 0, meaning none of these players would ever be expected to accomplish anything against anyone, ever. This may not be an entirely realistic vision of the tennis prowess of these individuals, especially if their blowout loss came against a high-level opponent.

The same problem also crops up on the other end of the spectrum, albeit less frequently. Still, since the top end of the list holds significantly more interest than the bottom, it’s a much bigger problem when it does arise. The critical case emerged when calculating the ratings for 2010, in the person of Dmitri Sitak.

If you’ve never heard of Wayne Odesnik, there is absolutely no chance you’re familiar with Dmitri Sitak. He has never made it into the top 300 in the ATP rankings, with even his dubious high point having been achieved over a decade ago, and has spent his singles career playing low-level events with moderate success at best.

In 2010, his entire playing log among considered matches (that is, main draws of ATP World Tour and Challenger Tour events) was as follows:

San Benedetto Challenger:
Defeated Malek Jaziri 6-1, 6-1
Withdrew before next match

And that’s it. That single match provided Sitak with a 12-2 record in service games, good for a winning percentage of .857 which is nearly 3 times as far from .500 as Djokovic’s was in his historic 2011 season. Jaziri was nothing special as an opponent, but he went 2-3 in the other five matches he played on the Challenger Tour that year, so he wasn’t a complete pushover either.

The initial run of the 2010 ratings listed Dmitri Sitak as the best when-on-court player in the world, which is not a remotely sensible result. So what do you do to avoid this outlandish outcome? The article linked above proposed adding a win and a loss (or a few of them) against a dummy opponent for each player; applying that to tennis makes no particular difference in the ratings of the top players, who play well over 1000 service games per year, and provides some slight restraint to the ratings of the guys who barely showed up. A simple version of this modification left Sitak just outside the top 10, which is still far too high.

There is a better solution, which presented itself to me via an examination of the rest of Sitak’s 2010 performance. Remember, the ratings here do not factor in every match played in a given year; they ignore qualifiers and Futures events. In 2010, Sitak spent weeks upon weeks losing in qualifiers and posting mediocre performances in Futures tournaments. While I don’t have the time or inclination to enter all of those matches, I can at least acknowledge the general concept of their existence.

Here’s what I did: Take every player who made the main draw in less than 10 recorded (by me) events in a given year. For each event below 10, assign the player a loss to a generic qualifier by a generic score (I’m currently using 13-7, which can be either 7-5, 6-2, or 7-6, 6-1; there was no particularly special reason for this choice of score). So instead of ending up with a 12-2 service game record in 2010, Sitak’s performance is padded with nine 13-7 losses that are intended to represent his actual struggles in qualifying. This results in a total of 75-119, a much better reflection of his abilities that year.

Two primary varieties of players will be affected by this adjustment. First, there are the Sitak types – players who were not good enough to qualify for a sufficient number of the events I’m counting. It is a realistic reflection of the abilities of such players to downgrade them. (Some of the players affected, such as those who won no service games at all, have their ratings pulled upward by the comparatively respectable losses; this is also relatively sensible, as it regresses their ratings toward a still-very-low mean.)

The second player type that will be affected is the one who gets injured. Notable players with less than 10 events entered in a year in the seasons I’ve entered include David Nalbandian (2009, 9 events) and Juan Martin del Potro (2010, 3 events). Applying this adjustment to them requires a slightly more involved bit of rationalization, which goes as follows: if a player is not sufficiently fit to take the court for a large portion of the year, it makes sense to me for his rating to be pulled down by that fact. And it has to be a very large portion of the year to keep the player from entering at least 10 events – Rafael Nadal missed the entire 2012 season after Wimbledon, nearly half of the year, and still played in 11 tournaments.

With that modification processed, here are the players with the 10 highest ratings of 2011:

Player
Rating
Novak Djokovic
2.77
Rafael Nadal
2.38
Roger Federer
2.35
Andy Murray
2.15
David Ferrer
2.05
Juan Martin del Potro
1.92
Tomas Berdych
1.88
Robin Soderling
1.86
Mardy Fish
1.77
Jo-Wilfried Tsonga
1.74

Two players are gone from the earlier table. Tipsarevic slips from ninth in the unadjusted rankings to eleventh; Odesnik slides considerably further, from tenth to seventy-second. And Djokovic, of course, dominates; he's projected to take at least 54% of service games from every other player on tour, which is quite a lot.
This rating gives a reasonable picture of player performance when on court – the top five are an exact match with the ATP’s end-of-year list in 2011, and eight of the top ten correspond between the two rankings. But performance when on court is still only part of the overall picture of a tennis player, because it doesn’t account for durability beyond simply playing 10 or more events – Soderling, for instance, played his last match of 2011 (and indeed, his last match ever, to date) in July, thus missing a large chunk of the year. The method of turning this rating into one that can sensibly account for the ability to make it onto the court on a regular basis will be explored in the third and final post laying out this evaluation system.

No comments:

Post a Comment