The last several decades have seen a veritable explosion of baseball statistical analysis, both by fans and by actual MLB teams. One of the primary products of this effort has been Wins Above Replacement (WAR), the general estimate of the number of wins a particular player provided in comparison to a freely available substitute. The merits of WAR have been debated ad nauseam since its introduction, but its appeal is obvious; it combines all of a player’s contributions (or at least, all that we have measurements for) into a single number that allows direct comparisons between players on different teams, at different positions, or from different time periods. For all its faults, WAR at least serves as a reasonable estimate and a good place to begin discussion.
It is not, however, the end of the discussion. I have been
experimenting with a series of adjustments to baseline WAR totals which I feel
lead to a more reasonable overall rating of baseball players throughout
history, and will be presenting both those adjustments and some of the results
over a series of upcoming posts. The adjustment I started with is a method of
combining peak and career value into one number.
Career value is a straightforward concept – just add up the
WAR totals from each of the player’s individual seasons and you get an estimate
of the number of wins he added to his teams over the course of his career. Peak
is more nebulous. It can be broadly defined as the player’s performance in
his best seasons, but the follow up question is obvious: how many seasons constitute
a peak?
Various analysts have used various answers to this question. In
the New Historical Abstract, Bill James uses both best three seasons and best
five consecutive seasons. The JAWS system, developed by Jay Jaffe and currently published by Baseball Reference, uses the best 7. One could reasonably argue
that a true peak should just be the player’s best season; one could also make a
case for a top 10 measure, trying to capture a player’s extended prime rather
than their absolute pinnacle.
None of these answers is necessarily wrong, but each of them
presents a different version of the same problem. Whatever cutoff you use when
defining a player’s peak will arbitrarily benefit a subset of players who have
exactly the right number of outstanding seasons.
If you focus only on a player’s single best season, Norm
Cash does extremely well, with 9.2 Baseball Reference WAR (bWAR) in 1961. His
second-best season was 1965, with a comparatively unimpressive 5.4. If you go
for top two years, you’ll be fairly fond of Roger Maris, whose best years are a
7.5 and a 6.9 (which won him back-to-back MVPs), followed by a drop to 3.8. Top
3? Carl Yastrzemski looks spectacular (12.5, 10.5, 9.5); never mind the
subsequent drop to 6.6.
The problem is mitigated somewhat if you increase the number
of seasons, but it never goes away entirely. Ernie Banks has a terrific top
four seasons (10.2, 9.3, 8.1, 7.9); his next four entries drop off by about one
win each (6.7, 5.3, 4.6, 3.5). He looks gradually worse as your peak
consideration set increases from four years to eight. Speaking of shortstops,
Nomar Garciaparra has six years between 6.1 and 7.4 bWAR; his seventh-best
season is a 2.5. Troy Tulowitzki is in the same boat, with a top six between
5.0 and 6.8, then a drop to 3.2. It’s not a new phenomenon either; George
Sisler had a tremendous six-year peak from 1917-22 (all between 5.7 and 9.7
bWAR), plus a very solid 4.1 in 1916. After that, he developed double vision
and was never the same, failing to exceed 2.7 bWAR in any of his remaining
seven seasons.
How can we avoid the problem of arbitrarily favoring a
particular subset of players that benefits from choosing a particular number of
peak seasons? Simple: Don’t choose a particular number. Instead, take the
average of (best season), (best two seasons), (best three seasons), and so on.
You still have to end at some arbitrary point (I’m ending it after 19 years and
adding total career rating as a twentieth term in the average), but the effect
of the arbitrariness is greatly reduced. (A vast majority of players don’t play
19 seasons at all; even among those who do, very few of them have 19 GOOD
seasons. By my count, only one position player in baseball history has over 20
seasons with noteworthy positive WAR values, and he’s been dead for over a
century.)
Fortunately, it is not necessary to calculate totals for the
top 2 and top 3 and top 11 seasons for each player. The average presented above
is mathematically equivalent to the following, which is much easier to
calculate:
Weighted WAR = Best season + 0.95*(season 2) + 0.9*(season
3) + … + 0.1*(season 19) + 0.05*(all seasons past 19).
This is the formula whose results we’ll be examining moving
forward. It is intended to present a reasonable hybrid of peak, prime and
career value; I think it works quite well for this purpose, but the results will
be left to the reader’s evaluation.
The weighting formula itself, however, is just one of many
adjustments that will be presented. We’ll be exploring the others while also
presenting the overall results of the evaluation via the medium of top 100
lists at each non-pitching position.
A few additional notes on the project: First, which players
are considered for the rankings? The ideal answer would be “all of them,” but
sadly, the time and computing power available to me are both limited, so I had
to pick a cutoff of some kind. I ended up entering every player who has at
least one (schedule adjusted) 3-WAR season. That gives me a set of over 2400
players, including over 250 at every non-DH position. It also covers all but
two of the top 1000 players by total bWAR (for position players); I entered
both of them into the database just in case, and neither one made the top 100
at his position. It is still possible that I’ve missed someone who deserves a
top 100 spot, but I think it's unlikely.
As a housekeeping note, one of the additional adjustments
made in order to calculate weighted WAR is both obvious and (hopefully)
uncontroversial. If a player is traded midseason, his WAR totals for each part
of the season are added together before the weighting is done. This seems
barely worthy of mention except for the fact that Baseball Reference has not
always done this automatically in the WAR tables presented on its player pages.
As of last year, if you simply scanned Mark Teixeira’s page, he appeared to have
a stretch in the middle of his career including seasons of 2.6, 2.0, 4.1 and
3.7 WAR. In actuality, Teixeira was traded in both 2007 and 2008, which obscured
his much better seasonal totals of 4.6 and 7.8 WAR.
With the exception of Negro League data (to be discussed in
more depth later), all WAR values presented will be taken from Baseball
Reference. This is not because bWAR is necessarily the best system; it is,
however, readily available for all of recorded baseball history, from 1871 to
the present. (This is also true of Fangraphs WAR; I have a slight preference
for bWAR as its methodology is more clearly explained on the site, particularly
for older players. Every other WAR source I am aware of has missing seasons.)
Finally, it should be noted that while the weighting formula
and the other adjustments presented here are my work, the positional top 100
lists themselves are not my own personal top 100 lists. I would tend toward
using a combination of WAR systems in order to mitigate the idiosyncrasies of
any individual measure, and even with that in mind, all WAR systems will still
omit factors that I would consider (some of which will also be discussed as we go
through the rankings). So don’t sweat it if you disagree with the system’s
outcomes; so do I. This is less an exercise intended to produce unassailable
results than it is an exploration of what we can accomplish using a single WAR system
as a starting point.
With all that in mind, next up will be our first ranking
list (third base), combined with a discussion of how to adjust for schedule
length.
No comments:
Post a Comment