Having previously decided on service games won and lost with
no surface adjustments in assessing an individual tennis match, it remains now
to take the step from judging a single match to combining the results of over
7000 matches (an approximate total for a full ATP World Tour, ATP Challenger
Tour, and Davis Cup World Group season) into an overarching rating.
The first step is simple enough: add up each player’s totals
for service games won and lost in the year in question. For instance, in 2011,
Novak Djokovic won 1125 service games and lost 667. (I’ll use 2011 data for all
my examples here because it’s the year in which my rankings best match the
ATP’s, which allow for easier focus on the method itself rather than
differences in the outcomes.) That works out to a .628 winning percentage,
which happens to be unbelievably good (as were many things about Djokovic’s
2011 season). You do the same for the other 835 players who made at least one
appearance in one of the events considered, from world #2 Rafael Nadal down to
the three players who played in only one match and lost it 6-0, 6-0.
The raw records themselves can be used for unadjusted
rankings; the one I typically use for at-a-glance work when I’m not ready to
undertake the draw adjustment is number of binomial standard deviations above
average. This is calculated as follows:
(Games won –
Games lost)
(Square root of games played)
The top 10 in those unadjusted rankings for 2011 are:
Player
|
StDev
|
Novak Djokovic
|
10.8
|
Rafael Nadal
|
8.9
|
Roger Federer
|
8.0
|
David Ferrer
|
7.2
|
Andy Murray
|
6.8
|
Tomas Berdych
|
5.6
|
Juan Martin del Potro
|
5.3
|
Janko Tipsarevic
|
4.9
|
Robin Soderling
|
4.6
|
Wayne Odesnik
|
4.3
|
Most of those are names that are pretty familiar to even
casual tennis fans – they’re either the guys who keep winning Grand Slams, or
the guys who keep losing to the guys who keep winning Grand Slams. The most
obvious exception comes in at the #10 spot, in the person of Wayne Odesnik. If you
have never heard Odesnik’s name before, you can be forgiven for the oversight;
the ATP rankings placed him in 129th at the end of the 2011 season,
and have never listed him higher than 77th. So what is he doing in
the #10 spot here?
Odesnik spent most of 2011 on the Challenger Tour, playing (and
frequently beating) much weaker opposition than the top 9 players on the list.
His appearance leads us to the second part of the ranking: the draw adjustment. The most basic form of this adjustment would simply involve
shifting the player’s game winning percentage up or down by looking at the
average quality of his opponents. But in a sport like tennis, in which the
quality of opposition varies so wildly from match to match, this may not be the
best option.
Consider the following hypothetical: A player is in the
semifinals of a tournament, meaning that he is two matches away from winning
the title. Assume that the average quality of his opponent in those two matches
(if he plays both of them) is that of a player somewhere in the middle of the
top 100. Would the player be better off with two opponents ranked in the 40-60
range, or would he be better off facing one player outside the top 200 followed
by a top-5 foe in the final?
The answer to that question depends very heavily on the
identity of the player himself. If the player is one of the best in the world,
he would be unperturbed by the prospect of facing two 40-60 players, and would
likely win both matches with relative ease. On the other hand, a top-5 matchup
in the final would hold significant potential for defeat. If the player in
question is somewhere outside the top 100, however, he would likely obtain a
preferable outcome in the second scenario – he has a good chance of beating the
lower-ranked foe, which at least puts him in the final, whereas he’d be likely
to lose in the semi against a better player.
In other words, difficulty of draw is best evaluated within
the context of the abilities of the player himself. There is a rating that
handles this very well. It is introduced (albeit using football rather than tennis) in
this
article, in which it is referred to as an Elo rating (although I’m not certain that the rating described is actually an Elo rating, though they are closely related; the Elo rating for chess is described here). I am using a modified
version of the rating system laid out in that post. The method starts with the
calculation of the player’s expected winning percentage against each of his
opponents, and adjusts every player’s ratings until their expected win totals
match those they actually recorded.
Instead of the exponential function used in the source
article for calculating win probabilities, my choice of formula was simply:
Service game winning percentage (for player A against player
B) = A / (A+B)
Where A and B are the ratings of players A and B,
respectively. It is still calculated iteratively, and instead of the changes in
the ratings being additive and the ratings centering around 0, the changes are
multiplicative and the ratings are all positive and centered around 1. Or, putting it in
slightly plainer language: Every player is assigned a rating of 1.0 at the
beginning of the calculations, everyone’s expected wins are figured with those
ratings, and everyone’s rating is then multiplied by (actual service games won /
expected service games won). Expected games won are then recalculated with the
new ratings, and the ratings are adjusted again. This process is repeated many,
many times until actual wins and expected wins are equal.
(Side note: The formula I’m using converges much, much
faster than the exponential version laid out in the post linked above, requiring a few hundred iterations rather than many thousands. The projected winning percentages are identical, because the formulas are
mathematically equivalent. I wish I had figured this out much earlier than I
actually did, or on purpose.)
There are some issues with this style of ranking, which are
described in the post linked above and the posts it links to in turn. They boil
down to the fact that it skews the ratings of unbeaten players (which isn’t too
bad, because tennis players don’t generally go unbeaten in matches, let alone
service games) and winless players (uh oh).
As mentioned earlier, there were three players in 2011 who
went 0-12 in service games. If you apply one iteration of the formula
to them, you get Rating (1) = 1.0* (Actual wins / Expected wins). Since actual
wins are 0, this gives a rating of 0, meaning none of these players would ever
be expected to accomplish anything against anyone, ever. This may not be an
entirely realistic vision of the tennis prowess of these individuals, especially if their blowout loss came against a high-level opponent.
The same problem also crops up on the other end of the
spectrum, albeit less frequently. Still, since the top end of the list holds significantly more interest than the bottom, it’s a much bigger problem when it does arise. The
critical case emerged when calculating the ratings for 2010, in the person of
Dmitri Sitak.
If you’ve never heard of Wayne Odesnik, there is absolutely
no chance you’re familiar with Dmitri Sitak. He has never made it into the top
300 in the ATP rankings, with even his dubious high point having been achieved
over a decade ago, and has spent his singles career playing low-level events
with moderate success at best.
In 2010, his entire playing log among considered matches
(that is, main draws of ATP World Tour and Challenger Tour events) was as
follows:
San Benedetto Challenger:
Defeated Malek Jaziri 6-1, 6-1
Withdrew before next match
And that’s it. That single match provided Sitak with a 12-2 record in service games, good for a
winning percentage of .857 which is nearly 3 times as far from .500 as
Djokovic’s was in his historic 2011 season. Jaziri was nothing special as an
opponent, but he went 2-3 in the other five matches he played on the Challenger Tour that year, so he
wasn’t a complete pushover either.
The initial run of the 2010 ratings listed Dmitri Sitak as
the best when-on-court player in the world, which is not a remotely sensible
result. So what do you do to avoid this outlandish outcome? The article linked
above proposed adding a win and a loss (or a few of them) against a dummy
opponent for each player; applying that to tennis makes no particular
difference in the ratings of the top players, who play well over 1000 service
games per year, and provides some slight restraint to the ratings of the guys
who barely showed up. A simple version of this modification left Sitak just outside the top 10, which is still far too high.
There is a better solution, which presented itself to me via an examination of the rest of Sitak’s 2010 performance. Remember, the ratings here
do not factor in every match played in a given year; they ignore qualifiers and
Futures events. In 2010, Sitak spent weeks upon weeks losing in qualifiers and
posting mediocre performances in Futures tournaments. While I don’t have the
time or inclination to enter all of those matches, I can at least acknowledge the general concept of their existence.
Here’s what I did: Take every player who made the main draw
in less than 10 recorded (by me) events in a given year. For each event below
10, assign the player a loss to a generic qualifier by a generic score (I’m currently
using 13-7, which can be either 7-5, 6-2, or 7-6, 6-1; there was no
particularly special reason for this choice of score). So instead of ending up
with a 12-2 service game record in 2010, Sitak’s performance is padded with nine 13-7 losses that are intended to represent his actual struggles in qualifying. This results in a total of 75-119, a much better reflection of his abilities that year.
Two primary varieties of players will be affected by this
adjustment. First, there are the Sitak types – players who were not good enough
to qualify for a sufficient number of the events I’m counting. It is a realistic reflection of
the abilities of such players to downgrade them. (Some of the players affected,
such as those who won no service games at all, have their ratings pulled upward
by the comparatively respectable losses; this is also relatively sensible, as
it regresses their ratings toward a still-very-low mean.)
The second player type that will be affected is the one who
gets injured. Notable players with less than 10 events entered in a year in the
seasons I’ve entered include David Nalbandian (2009, 9 events) and Juan Martin
del Potro (2010, 3 events). Applying this adjustment to them requires a slightly
more involved bit of rationalization, which goes as follows: if a player is not
sufficiently fit to take the court for a large portion of the year, it makes
sense to me for his rating to be pulled down by that fact. And it has to be a
very large portion of the year to keep the player from entering at least 10 events
– Rafael Nadal missed the entire 2012 season after Wimbledon, nearly half of
the year, and still played in 11 tournaments.
With that modification processed, here are the players with
the 10 highest ratings of 2011:
Player
|
Rating
|
Novak Djokovic
|
2.77
|
Rafael Nadal
|
2.38
|
Roger Federer
|
2.35
|
Andy Murray
|
2.15
|
David Ferrer
|
2.05
|
Juan Martin del Potro
|
1.92
|
Tomas Berdych
|
1.88
|
Robin Soderling
|
1.86
|
Mardy Fish
|
1.77
|
Jo-Wilfried Tsonga
|
1.74
|
Two players are gone from the earlier table. Tipsarevic slips from ninth in the unadjusted rankings to eleventh; Odesnik slides considerably further, from
tenth to seventy-second. And Djokovic, of course, dominates; he's projected to take at least 54% of service games from every other player on tour, which is quite a lot.
This rating gives a reasonable picture of player performance
when on court – the top five are an exact match with the ATP’s end-of-year list
in 2011, and eight of the top ten correspond between the two rankings. But
performance when on court is still only part of the overall picture of a tennis
player, because it doesn’t account for durability beyond simply playing 10 or
more events – Soderling, for instance, played his last match of 2011 (and
indeed, his last match ever, to date) in July, thus missing a large chunk of
the year. The method of turning this rating into one that can sensibly account
for the ability to make it onto the court on a regular basis will be explored
in the third and final post laying out this evaluation system.
No comments:
Post a Comment