Monday, February 3, 2014

Ranking tennis players: Method (Part 3)

The previous two posts have laid out the framework for evaluating tennis players in terms of performance when on the court. All that remains is to take the when-on-court ratings and turn them into an evaluation of an entire season.

Why not just use the on-court ratings themselves? There are two reasons, which are closely related to one another. First, because durability is an important part of any sport; your opportunities to achieve increase the more time you're able to spend on the court (or field, or pitch, or rink, depending on your sport of choice). And second, because performing at a high level becomes more impressive the longer you do it. Any hitter in baseball can have a week in which he goes 9 for 27; it’s much harder to hit .333 for an entire season. Tennis is the same way. The more matches you play, the more confidence we can have in the level of your ability. There’s also a secondary factor at work in tennis, which is that the players can control which tournaments they play; if someone plays high-quality tennis on the main ATP tour in South America in February and in Europe in April, May, and July, but sits out or only plays Challenger events the rest of the year, he’s very likely to be a clay court specialist who would not necessarily be able to maintain his level on other surfaces.

The process of taking a rate statistic and turning it into a cumulative one is more difficult in this case than it would be in most, because of the nature of the when-on-court ratings I’m using. For many rating systems, you can simply take the difference between the player’s performance and a particular baseline and multiply by playing time. But in the case of this rating, there are two issues with that process. First, the absolute differences in the ratings are not what’s used to evaluate the players. Since the winning percentage projection takes the form of A/(A+B), a player with a rating of 1.5 is projected to win 60% of service games when facing a player rated at 1.0 (difference of 0.5), and so is a player rated at 2.4 facing one evaluated at 1.6 (difference of 0.8).

Second, there is no immediately obvious baseline to select for comparison. It is comparatively easy to define, say, average performance in baseball or basketball; you simply add up the performance of all the players or teams in the league. In tennis, when considering data taken from multiple levels of competition and players who are often given berths in tournaments for crowd-pleasing rather than merit-based reasons, defining a baseline becomes much murkier.

The basis for the approach that I’ve selected comes from trying to answer the following question: Is it more impressive to play like the tenth-best player in the world in 12 tournaments, or the twentieth-best player in the world in 25 tournaments?

It may be easier to understand the method for answering this question if the question itself is rephrased: How difficult is it to play like the #20 player for 25 events? Or, to put another spin on it, what would be the chances of a relatively low-level player matching the performance of #20 across the same matches? Phrased like that, it becomes a question that can be approached statistically.

2011 was used as the example year in the previous post, and I’ll stick with it now. In 2011, Stanislas Wawrinka played in 1419 service games, and won 745 of them. The odds of a coin coming up heads exactly 745 times in 1419 flips can be calculated using the binomial distribution (they are 0.36%, at least with a fair coin). But Wawrinka’s performance was not registered in a series of 50-50 tests; his odds were much worse than that when facing Novak Djokovic or Roger Federer, and better when facing David Goffin or Thomas Schoorel (he actually faced all of these players in 2011). So a bit more sophistication is needed here.

Since we already have a record of the exact draw Wawrinka faced in 2011, along with a measure of the on-court quality of every player who appeared in the top two levels of tennis, it is not an enormously complicated enterprise to project another player’s performance against Wawrinka’s set of opponents. For example, Nikolay Davydenko would be expected to win 51.2% of his service games against the same players Wawrinka faced, and the odds of his out-performing Wawrinka against them would be roughly 17.3% (estimated using the binomial distribution, with the simplifying assumption of that .512 winning percentage applying across all the service games played). Taking a player significantly lower in the ratings, Sam Querrey is projected to win 48% of service games against Wawrinka’s foes, and would be expected to match or exceed Stan’s performance across the same schedule a microscopic .035% of the time.

The question then becomes: Which other player is the right one to use for comparison?

At the moment, I do not have a terribly scientific answer; attempting to develop a rigorous approach to this issue may be something to revisit in the future. My method of baseline selection was nothing more involved than “try a few things and see which one looks best.” The thing that looked best was the 75th-best on-court player in the world. Putting that player (who happened to be Steve Darcis in 2011) through Wawrinka’s draws gives an expected service game winning percentage of .468, and a .000945% chance of bettering Stan's work.

The odds of being out-performed by the #75 player in the world make a pretty good ranking system – they combine ability and playing time (since fluke results from a low-rated player are more likely to occur in a smaller sample of play), and effectively account for draw faced. However, there are a few issues. First, they create an artificial hitch in the rankings at #75 (or whatever baseline ranking you select) – the player who rates as 74th-best when on the court will never finish behind the player who rates 76th-best, even if the latter played twice as many matches at a nearly-identical level. And second, they do not effectively differentiate between players ranked significantly lower than 75th, because they all have roughly the same 100% chance of being out-performed by #75.

The first issue is inherent to the method. The identity of the 75th-best player in the world in a given year is not a question that holds a tremendous amount of import for me, and I suspect the same is true of most people who are interested in tennis rankings; as such, I am willing to accept this as a flaw and simply keep it in mind when examining players ranked around this spot.

The second issue, however, is one that can be addressed more easily. It involves a small bit of mathematical wrangling and a second application of the binomial distribution to the other side of the odds, as follows:

Player rating = (Odds of the player out-performing the #75 player)
                                    (Odds of #75 out-performing him)

The denominator is microscopically small for the very best players in the world, while the numerator is for all intents and purposes 1; this results in colossally large scores. For significantly poorer players, the numerator is tiny and the denominator is essentially 1, producing miniscule fractions. The ordinal ranking of the players will be the same, but the differences on the lower levels will be much more visibly apparent.

For Wawrinka in 2011, this ratio is 105,864 and change. For many of the players ranked ahead of him, it is much, much larger. So in order to keep the numbers manageable and to avoid the sense of false precision offered by numbers like 105,864, the final step I've used is to take the base-10 logarithm of the result, giving Wawrinka a much simpler-looking score of 5.0.

With the method now in place, the fun part comes next: using it. Over the next several posts, we’ll start with the 2011 rankings and jump around through the five years of data I’ve assembled, taking no small number of detours along the way, and see what we can find out about both the best tennis players in the world and the method itself.

No comments:

Post a Comment