My last post concluded with the statement that I would be
presenting a tennis ranking system based on the answers to two questions: How
did you play, and who did you play? Before building such a system, however, it is first necessary to decide on the meanings of the questions themselves. I’ll tackle the queries in order.
First, how did you play? This inquiry could be interpreted in a variety
of ways. Did you win the match? How many sets did you win and lose? How many
service games? How often did you hold serve, or break serve? How many points
did you win on your own serve, or your opponent’s?
My general inclination is to answer a question with as much
information as possible. So, rather than going with simply “yes, the player won
his match,” or “the player won his match 2 sets to 1,” it is more useful to
say, “the player won his match 6-2, 6-7, 6-1.”
The most granular information that one could expect to find
readily available would be points won and lost, preferably split up between
serving and returning. And indeed, this type of information is accessible on the ATP’swebsite (which is generally very good, and is the main source I'm using for my tennis work) for numerous matches, and I have collected the available point data for the
last five years’ worth of Grand Slams and ATP World Tour events. But the method
I’m presenting here will be based on the more basic option of service games won
and lost rather than serve and return points, for three reasons.
First, the data are more reliable. In examining the point-based statistics for each match, I have noticed cases in which the results as described on
the ATP website are impossible – for instance, a player who won his match in
straight sets despite having broken serve fewer times than his opponent. The simpler
“6-2, 6-7, 6-1” form of match evaluation is tracked on the scoreboard
throughout the match rather than requiring specifically dedicated
point-by-point record keeping, and thus is more likely to be correct.
Second, the simpler data are available further back into
history and for a wider selection of events. Even as recently as 2009, there are a few matches for which
point-based information is not present on the ATP’s site. As you go back
further (which I intend to continue doing over time), the more granular information
disappears entirely. A service game-based evaluation method should be equally
applicable to 2013 and 1983, thus (eventually) providing a means of comparison between current stars and past greats. Also, the ATP website does not include point-based information on matches from the Davis Cup, and the Davis Cup website also does not feature such information in readily usable form. Since many top players participate in the Davis Cup on an annual basis, using the simpler description of the match allows the inclusion of an important part of their performance.
Third, the data are frankly much quicker and easier to gather.
As mentioned, I have entered point data for the ATP World Tour matches from the
last five years. I have also entered data from the lower-level Challenger Tour
from those five years, allowing me to better evaluate the second-tier players
on the circuit. There are about 150 Challenger events played every year;
entering the game-based data only for a 32-man event takes roughly 5-7 minutes,
while entering point-based data takes between 20 minutes and half an hour.
Multiply that time difference by 150, and hopefully my motivation for entering
the Challenger matches in games-only form becomes clear.
So my primary ranking method will take a player who wins a
match 6-2, 6-7, 6-1 and evaluate his performance by saying, “He won 18 service
games and lost 10.”
On to the second question: who did you play? This question seems simple enough
to answer; just look at the person on the other side of the net. But there is a
significant issue with that, because the abilities of the opposing player are
often heavily dependent on the surface under his feet. Beating Andy Murray on
clay (a surface on which he’s never made it to a final in any tournament) is
not necessarily easy, but it is relatively manageable; beating him on grass (on which he’s won five titles, including a Grand Slam and an Olympic
gold medal) is another matter entirely.
At the moment, I do not intend to adjust for the surface
on which matches are played. There would be significant sample size concerns
with any such adjustment, particularly on less common surfaces such as grass, or the now-rarely-seen carpet. A large number of players will have only one match
in a given year on grass (the first round at Wimbledon), making it impossible
to assess their ability level on that surface with any accuracy whatsoever. (In
a particularly vexing case, Rafael Nadal lost his first-round match at
Wimbledon in 2013 to Steve Darcis, and Darcis promptly withdrew from the event
before his next match. That leaves us with absolutely no way to compare either
player to anyone but each other on grass without stretching our frame of reference
across multiple years, which leads to any number of other issues.) The difficulty of building a surface adjustment into the rankings would also be significantly more than trivial. And the advantages would be moderate at best - while it would provide some benefit to correct for the difference between Murray's performance on grass and clay, those differences are likely to be quite small when factored into the draw adjustment for a player with 40-plus matches played in a year, and the odds of having a draw that's made significantly more favorable than it appears by opponent-surface confluence seem relatively remote, especially because players tend to spend as much time as possible on the surfaces on which they perform best.
Similarly, I
will not be compensating for other factors that may influence the
quality of the opponent's play, such as injuries; I have neither the time nor the informational resources necessary to make such adjustments in anything approaching systematic
fashion.
There is also the question of which matches will be considered. The invaluable Results Archive section of the ATP website includes the main draws from every Grand Slam, World Tour, Challenger Tour, and ITF Futures Tour event of the last decade-plus (quite a bit farther back than that for the higher tours). If you examine the playing activity logs for individual players, you can also find their performance in the Davis Cup and qualifying matches throughout their careers, and the Davis Cup website also has the results for its own event available for numerous years in the past.
These rankings will include performance in the main draws of Grand Slams, ATP World Tour events, and ATP Challenger Tour events, along with the highest-level bracket of the Davis Cup (the World Group). They will not include the Futures Tour, due to the prodigious amount of time that would be required to record the hundreds of Futures events played every year, the massive increase in the player list that would severely overtax my already-strained spreadsheets, and the generally low ranking of the players who participate in those events. They also will not include qualifying matches for any event, because past results from these matches are not available anywhere that I am aware of except in the performance logs for individual players, and I am not inclined to take the time necessary to scour those logs for the requisite information.
So in looking at, say, a 6-2, 6-7, 6-1 win over Denis
Istomin (the current #1 Uzbek player) on a clay court, the system that will be
presented here would say, “The player under consideration won 18 games and lost
10 while facing Denis Istomin on a court that is exactly like any other tennis
court. Unless the match occurred in qualifying, in which case I have no idea what you're talking about.” The limitations that will result are relatively obvious, and should be
kept in mind moving forward.
Next on the agenda will be the process of taking the
results from a player’s matches over the course of a full season and turning them
into a cohesive and (hopefully) sensible rating.
No comments:
Post a Comment