Monday, January 6, 2025

Introducing Weighted WAR

The last several decades have seen a veritable explosion of baseball statistical analysis, both by fans and by actual MLB teams. One of the primary products of this effort has been Wins Above Replacement (WAR), the general estimate of the number of wins a particular player provided in comparison to a freely available substitute. The merits of WAR have been debated ad nauseam since its introduction, but its appeal is obvious; it combines all of a player’s contributions (or at least, all that we have measurements for) into a single number that allows direct comparisons between players on different teams, at different positions, or from different time periods. For all its faults, WAR at least serves as a reasonable estimate and a good place to begin discussion.

It is not, however, the end of the discussion. I have been experimenting with a series of adjustments to baseline WAR totals which I feel lead to a more reasonable overall rating of baseball players throughout history, and will be presenting both those adjustments and some of the results over a series of upcoming posts. The adjustment I started with is a method of combining peak and career value into one number.

Career value is a straightforward concept – just add up the WAR totals from each of the player’s individual seasons and you get an estimate of the number of wins he added to his teams over the course of his career. Peak is more nebulous. It can be broadly defined as the player’s performance in his best seasons, but the follow up question is obvious: how many seasons constitute a peak?

Various analysts have used various answers to this question. In the New Historical Abstract, Bill James uses both best three seasons and best five consecutive seasons. The JAWS system, developed by Jay Jaffe and currently published by Baseball Reference, uses the best 7. One could reasonably argue that a true peak should just be the player’s best season; one could also make a case for a top 10 measure, trying to capture a player’s extended prime rather than their absolute pinnacle.

None of these answers is necessarily wrong, but each of them presents a different version of the same problem. Whatever cutoff you use when defining a player’s peak will arbitrarily benefit a subset of players who have exactly the right number of outstanding seasons.

If you focus only on a player’s single best season, Norm Cash does extremely well, with 9.2 Baseball Reference WAR (bWAR) in 1961. His second-best season was 1965, with a comparatively unimpressive 5.4. If you go for top two years, you’ll be fairly fond of Roger Maris, whose best years are a 7.5 and a 6.9 (which won him back-to-back MVPs), followed by a drop to 3.8. Top 3? Carl Yastrzemski looks spectacular (12.5, 10.5, 9.5); never mind the subsequent drop to 6.6.

The problem is mitigated somewhat if you increase the number of seasons, but it never goes away entirely. Ernie Banks has a terrific top four seasons (10.2, 9.3, 8.1, 7.9); his next four entries drop off by about one win each (6.7, 5.3, 4.6, 3.5). He looks gradually worse as your peak consideration set increases from four years to eight. Speaking of shortstops, Nomar Garciaparra has six years between 6.1 and 7.4 bWAR; his seventh-best season is a 2.5. Troy Tulowitzki is in the same boat, with a top six between 5.0 and 6.8, then a drop to 3.2. It’s not a new phenomenon either; George Sisler had a tremendous six-year peak from 1917-22 (all between 5.7 and 9.7 bWAR), plus a very solid 4.1 in 1916. After that, he developed double vision and was never the same, failing to exceed 2.7 bWAR in any of his remaining seven seasons.

How can we avoid the problem of arbitrarily favoring a particular subset of players that benefits from choosing a particular number of peak seasons? Simple: Don’t choose a particular number. Instead, take the average of (best season), (best two seasons), (best three seasons), and so on. You still have to end at some arbitrary point (I’m ending it after 19 years and adding total career rating as a twentieth term in the average), but the effect of the arbitrariness is greatly reduced. (A vast majority of players don’t play 19 seasons at all; even among those who do, very few of them have 19 GOOD seasons. By my count, only one position player in baseball history has over 20 seasons with noteworthy positive WAR values, and he’s been dead for over a century.)

Fortunately, it is not necessary to calculate totals for the top 2 and top 3 and top 11 seasons for each player. The average presented above is mathematically equivalent to the following, which is much easier to calculate:

Weighted WAR = Best season + 0.95*(season 2) + 0.9*(season 3) + … + 0.1*(season 19) + 0.05*(all seasons past 19).

This is the formula whose results we’ll be examining moving forward. It is intended to present a reasonable hybrid of peak, prime and career value; I think it works quite well for this purpose, but the results will be left to the reader’s evaluation.

The weighting formula itself, however, is just one of many adjustments that will be presented. We’ll be exploring the others while also presenting the overall results of the evaluation via the medium of top 100 lists at each non-pitching position.

A few additional notes on the project: First, which players are considered for the rankings? The ideal answer would be “all of them,” but sadly, the time and computing power available to me are both limited, so I had to pick a cutoff of some kind. I ended up entering every player who has at least one (schedule adjusted) 3-WAR season. That gives me a set of over 2400 players, including over 250 at every non-DH position. It also covers all but two of the top 1000 players by total bWAR (for position players); I entered both of them into the database just in case, and neither one made the top 100 at his position. It is still possible that I’ve missed someone who deserves a top 100 spot, but I think it's unlikely.

As a housekeeping note, one of the additional adjustments made in order to calculate weighted WAR is both obvious and (hopefully) uncontroversial. If a player is traded midseason, his WAR totals for each part of the season are added together before the weighting is done. This seems barely worthy of mention except for the fact that Baseball Reference has not always done this automatically in the WAR tables presented on its player pages. As of last year, if you simply scanned Mark Teixeira’s page, he appeared to have a stretch in the middle of his career including seasons of 2.6, 2.0, 4.1 and 3.7 WAR. In actuality, Teixeira was traded in both 2007 and 2008, which obscured his much better seasonal totals of 4.6 and 7.8 WAR.

With the exception of Negro League data (to be discussed in more depth later), all WAR values presented will be taken from Baseball Reference. This is not because bWAR is necessarily the best system; it is, however, readily available for all of recorded baseball history, from 1871 to the present. (This is also true of Fangraphs WAR; I have a slight preference for bWAR as its methodology is more clearly explained on the site, particularly for older players. Every other WAR source I am aware of has missing seasons.)

Finally, it should be noted that while the weighting formula and the other adjustments presented here are my work, the positional top 100 lists themselves are not my own personal top 100 lists. I would tend toward using a combination of WAR systems in order to mitigate the idiosyncrasies of any individual measure, and even with that in mind, all WAR systems will still omit factors that I would consider (some of which will also be discussed as we go through the rankings). So don’t sweat it if you disagree with the system’s outcomes; so do I. This is less an exercise intended to produce unassailable results than it is an exploration of what we can accomplish using a single WAR system as a starting point.

With all that in mind, next up will be our first ranking list (third base), combined with a discussion of how to adjust for schedule length.

No comments:

Post a Comment