Having reviewed the existing pitcher ratings, talked about Game Score and its adjustments as the basis for a new rating system, introduced the concept behind Game Score Deviations and then the results of the metric itself, and finally explored what a career rating system might look like, it’s time for the fun stuff. Today, we start going through the top 100 starting pitchers of all time.
(Well, of
1901-2022, because those are the years for which I have completed scores. And
as has been previously noted, Baseball Reference doesn’t have game-by-game
Negro League data, and therefore neither do I. But “top 100 of all time” sounds
better.)
As a reminder
from the post on career evaluation, we’re looking at a hybrid of two different GSDev-based
approaches. First, add up the squares of all of the pitcher’s positive GSDev
totals and take the square root of the sum (root sum square, or RSS, for
short). Second, a peak-weighted sum (100% weight for the pitcher’s best season, 95% for second best,
and so on), with the result multiplied by 0.39 to scale down to the same range as the RSS values.
The pitchers will be ranked by the average of those two numbers, which will be
listed as “Combo.”
To get a sense of how GSDev fares against conventional statistical wisdom, I also pulled bWAR totals for the same 1901-22 period (filtering out relievers to even out the rankings), and we’ll be comparing the bWAR ranks of our top 100 to their GSDev ranks. There are some pretty noteworthy differences, which we’ll explore as we go. But for now, let’s get to the tables, starting with the bottom 10 of the top 100:
|
Rank |
Pitcher |
Years |
RSS |
WtSum |
Combo |
bWAR Rnk |
|
100 |
Madison
Bumgarner |
2009-22 |
30.30 |
77.2 |
30.20 |
206 |
|
99 |
Paul
Derringer |
1931-45 |
29.81 |
78.6 |
30.24 |
149 |
|
98 |
Corey Kluber |
2012-22 |
32.29 |
73.3 |
30.43 |
187 |
|
97 |
Josh Beckett |
2001-14 |
30.69 |
77.8 |
30.51 |
179 |
|
96 |
Jimmy Key |
1985-98 |
30.43 |
79.2 |
30.66 |
89 |
|
95 |
John Lackey |
2002-17 |
30.28 |
79.7 |
30.68 |
158 |
|
94 |
Catfish
Hunter |
1965-79 |
30.97 |
78.4 |
30.77 |
174 |
|
93 |
Mickey Lolich |
1963-79 |
30.59 |
80.2 |
30.93 |
95 |
|
92 |
Mark Buehrle |
2000-15 |
30.51 |
80.8 |
31.01 |
51 |
|
91 |
Frank Viola |
1982-96 |
31.38 |
79.1 |
31.12 |
99 |
When going through ranking lists like this, I can always find something to talk about with a little effort. But sometimes, the numbers make it much easier
by providing a narratively perfect result. One such result is the #100 ranking
of Madison Bumgarner, who was a good pitcher in the regular season but has a
strong argument as the best postseason starting pitcher ever. The numbers are not
actually saying “well, if the postseason counts, we should probably
include the guy who has the best postseason record, so let’s squeeze him on at
the end of the list.” But it kind of feels that way.
Bumgarner is one of the five pitchers from this group who
have debuted since 2000. Four of the five are rated at #158 or lower by bWAR.
By contrast, Mark Buehrle is bWAR’s #51 pitcher, by far the highest ranking in
this group. The difference largely comes down to the peak adjustment, which
I’m using for GSDev but not for bWAR. Buehrle has nearly double Kluber’s WAR
total (60-34), but Kluber’s peak is significantly higher in both systems (as
you might guess from his having more Cy Young awards than Buehrle has seasons
in which he received even a single Cy Young vote). Peak adjustment will remain a complicating factor in the ranking comparisons as we move forward.
Moving a bit further back in time, let’s talk Catfish Hunter, whose ranking is guaranteed to make everyone unhappy. WAR is generally not a fan of Hunter, who benefited from playing in pitcher-friendly parks for great teams. However, he had both a fairly high peak and a good deal of success in the playoffs, and his best playoff performances tended to follow his best regular seasons (1972 and ’74 in particular). That being said, 91-100 is generally not Hall of Fame territory; there are fewer than 90 total pitchers in the Hall, and that group includes relievers, Negro Leaguers, and pre-1900 pitchers, all of whom are omitted here. Hunter is immediately behind Mickey Lolich, an exact contemporary who is preferred by both major WAR systems but who also got nowhere near joining Hunter in the Hall of Fame (despite having a pretty remarkable postseason record in his own right). So whether you’re a Hunter fan or a WAR disciple, you can find something to complain about here.
Onward and
slightly upward! Let’s move on to numbers 90-81:
|
Rank |
Pitcher |
Years |
RSS |
WtSum |
Combo |
bWAR Rnk |
|
90 |
David Wells |
1987-2007 |
30.70 |
81.4 |
31.21 |
68 |
|
89 |
Babe Adams |
1906-25 |
30.86 |
81.2 |
31.26 |
83 |
|
88 |
Bartolo Colon |
1997-2018 |
30.68 |
81.6 |
31.26 |
93 |
|
87 |
Fernando
Valenzuela |
1981-97 |
31.80 |
79.1 |
31.33 |
168 |
|
86 |
Bucky Walters |
1934-48 |
32.00 |
80.5 |
31.69 |
106 |
|
85 |
Stan
Coveleski |
1912-28 |
31.86 |
80.9 |
31.71 |
34 |
|
84 |
Gerrit Cole |
2013-22 |
32.69 |
79.6 |
31.87 |
205 |
|
83 |
Mark Langston |
1984-99 |
31.90 |
82.2 |
31.98 |
84 |
|
82 |
David Price |
2008-21 |
32.09 |
81.9 |
32.01 |
143 |
|
81 |
Chuck Finley |
1987-2002 |
31.71 |
83.7 |
32.17 |
54 |
We’ve talked
a few times about how GSDev’s interpretation of the early 1900s is far less rosy than
bWAR’s. Coveleski is our first encounter with this effect in the career
rankings. He was a fine pitcher, scoring as the best in baseball in 1920 and adding two ERA titles
outside of that season. GSDev gives him seven seasons graded as an ace, and
three in the top quartile (top 4, in his era). But his career isn’t especially
long by top-100 standards (11 seasons of 20-plus starts), and like many
pre-integration pitchers, his large WAR totals tended to be built on lots of
innings rather than game-to-game excellence. The results offer a direct comparison
to his contemporary Babe Adams, who does not have the same ranking disparity
between WAR and GSDev. Adams tended to miss more time in-season (Coveleski’s
fourth-highest innings total exceeded Adams’s second-highest), and counteracted
that deficit by being a little better per-game (his career average adjusted
Game Score is a point higher, driven by a strikeout-to-walk ratio that nearly
doubles Coveleski’s), and that is a combination that GSDev can get on board with.
On a related
but opposite note, Fernando Valenzuela is our first representative who peaked
in the late ‘70s or early ‘80s, the period with the lowest measured scrub
deviations to date. He won’t be the last. Valenzuela had just five ace-level
seasons (only two pitchers in the top 100 had fewer), but all five of them were top
quartile. He also had an exemplary playoff resume – 8 starts, 63 innings, 14
runs allowed (2.00 ERA). Five of those starts came in 1981,
pushing him to 30 starts and 233 total innings for the year; just looking at
his numbers, you’d never know it was a strike-shortened season.
Bonus note
before we get to the obvious guy: Mark Langston and Chuck Finley shared a rotation in Anaheim for eight years, and end up within a fraction of each other in the rankings, which is fun.
All right, time to talk about Gerrit Cole. By arithmetic difference in ranking (+121), Cole is the pitcher that
GSDev thinks bWAR underrates the most of anyone in either system’s top 100. If you
take the difference in square roots instead, to account for the fact that the (say) gap from 100 to 80 is very different from 25 to 5, the biggest disagreement is…
still Gerrit Cole.
Cole’s career
through 2022 is pretty peak-heavy; you can see that by looking at his RSS
score, which is the highest we’ve seen from 81-100 (RSS is more peak
friendly than the weighted sum in most cases). This makes sense, because Cole’s 2019 has the highest single-season GSDev we’ve encountered on the list so far, and I believe it will
only be surpassed by one other entrant over the duration of the 100-51 segment
of the rankings. And he complements that with five other ace seasons, three of
which placed in their seasonal top 10s. (If #29 all-time seems high for Cole’s
2019, a year in which he did not win the Cy Young, don’t forget our old friend
the postseason: five starts, 4-1, 1.72 ERA in 36.2 innings, 47 strikeouts to 11
walks. Very nice addition to an already outstanding regular season.)
One more Cole-related note: By sheer coincidence, 2022 appears to be a very nice cutoff point for a top-100 ranking of starting pitchers. This is true for two reasons. First, there weren’t many pitchers active in 2022 who were on the verge of cracking the list; best I can tell with the preliminary numbers I have so far, nobody looks likely to join the top 100 when 2023 and 2024 are added. And second, of the pitchers active in ’22 who were already in the top 100, only two of them appear to have improved their standing by more than a few places over the subsequent two seasons. One of those is Cole, who finally earned his long-awaited Cy Young in 2023. We’ll encounter the other shortly.
On to the
next group, 80 to 71:
|
Rank |
Pitcher |
Years |
RSS |
WtSum |
Combo |
bWAR Rnk |
|
80 |
Tommy Bridges |
1930-46 |
32.03 |
83.3 |
32.25 |
75 |
|
79 |
Tommy John |
1963-89 |
32.19 |
83.5 |
32.38 |
43 |
|
78 |
Javier
Vazquez |
1998-2011 |
32.46 |
83.2 |
32.46 |
126 |
|
77 |
Eppa Rixey |
1912-33 |
31.99 |
85.4 |
32.65 |
57 |
|
76 |
Lefty Gomez |
1930-43 |
33.62 |
81.9 |
32.78 |
128 |
|
75 |
Jacob deGrom |
2014-22 |
33.81 |
81.5 |
32.79 |
140 |
|
74 |
Red Faber |
1914-33 |
32.43 |
85.2 |
32.83 |
33 |
|
73 |
Frank Tanana |
1973-93 |
32.90 |
85.6 |
33.14 |
58 |
|
72 |
Early Wynn |
1939-63 |
32.74 |
86.7 |
33.27 |
76 |
|
71 |
Vida Blue |
1969-86 |
33.18 |
86.1 |
33.38 |
117 |
Javier
Vazquez grades out as a top-80 pitcher since 1901 by GSDev; bWAR disagrees but
still has him in the top 150. Vazquez last pitched in 2011, so his first
eligibility for the Hall of Fame would have been in 2017. He was not offered as an option on the
ballot.
I’m not
saying I’d vote for Vazquez for the Hall of Fame. But being included on the ballot is practically guaranteed if you have at least 10 years in the
majors (Vazquez, as you can see above, had 14). The 2017 ballot featured guys
like Arthur Rhodes, a reliever with 33 career saves (and 15 career WAR), and
Matt Stairs, a pretty good hitter who first exceeded 100 at-bats at age 28, and
whose teams kept trying to play him in the field even though he was the personification of the DH role (the fielding kept him down to 14 career WAR). But not Javier
Vazquez. He is probably the biggest omission from the Hall of Fame ballot since it has assumed its current form.
Outside of
Vazquez (and Jacob deGrom, whose two Cy Young awards assure that Cooperstown voters will at least get to consider him when his time comes), this is
predominantly a list of old-timers; four of the ten pitchers in this group
retired before 1950. As such, it’s probably time to discuss how pitcher usage
has changed over time, and the effect those changes have on the numbers.
There are two major shifts in usage of ace pitchers over the past century. First, starters are no longer expected to throw large numbers of complete games. In 1917, for instance, about 55% of all starts were completed. No single pitcher has completed half of his starts in a season (with a 10-start minimum) since 1988. Five individual pitchers in 1917 had at least 29 complete games; MLB as a whole in 2024 combined for 28.
Second, deadball aces often pitched in relief between starts. To go back to our modern pitchers from this set, Vazquez had seven relief outings in his career (plus two in the playoffs); through 2025, deGrom has none. By comparison, Red Faber had at least 19 starts in each of his first four MLB seasons, but also had 10 or more relief outings in all four of those years. Rixey had three seasons with similar totals of starts and relief outings. Gomez and Bridges relieved less (although still intermittently), but even Gomez (who had the fewest relief outings of our four old-timers in this set) still came out of the bullpen 48 times in his career, nearly all of them in seasons spent as a majority starting pitcher. This captures an overall league-wide trend. If I count correctly, there are nine seasons since 1901 in which the same pitcher led his league in complete games and saves; seven of these came in 1910 or earlier, and the last of them was in 1936 (Dizzy Dean).
This combination resulted in pitchers compiling workloads that would be inconceivable today. In the first 20 years of the AL/NL era, only one of the 42 league seasons (including the two Federal League years) didn’t have a single pitcher who threw 300 innings, and even that single instance wouldn’t have happened without a world war (Hippo Vaughn’s 290.1 innings led the abbreviated 1918 NL campaign). By contrast, the last 300-inning regular season came in 1980, and no pitcher has managed a seasonal total as high as 240 in over a decade. Of the four pre-integration pitchers in this group, probably the least durable was Tommy Bridges, who never led the league in innings during his career. Bridges still had five consecutive seasons (1933-37) in which any of his inning totals would have led the majors in every season since 2015.
What effect
do these changes have on the rankings? The first is obvious; GSDev looks at
starts only, so the pitchers who did significant relief work are having part of their value ignored.
Red Faber pitched about 11% of his total innings as a reliever; he’s probably
not underrated by the full 11%, but if you want to push him upward by half of that
amount, I won’t object. The additional innings pitched also have an indirect
effect on how the system looks at the old-timers, because in order to stay
fresh enough to throw 300 innings in a season, the aces of a century ago would
intentionally take it easy until they got into trouble. We know this because
the pitchers themselves tell us – Christy Mathewson’s autobiography, Pitching in a Pinch, is named after this common practice. This means
the pitchers were sacrificing some amount of their batter-by-batter performance
for the sake of higher volume. And since GSDev doesn’t prioritize volume as
heavily as WAR does, it will punish pitchers who make this tradeoff.
This is a
very small part of a larger discussion of how to evaluate pitchers across eras.
The changes in baseball over the last century-plus are numerous and diverse,
and accounting for all of them is near-impossible. Ultimately, while the above points about the treatment of older pitchers carry some weight, I think they are more than counterbalanced by the expansion in
both the league and the talent pool over the timespan we’re examining here. As such, I
think GSDev’s mildly modernist leanings in the rankings are likely appropriate
(even though they weren’t a designed feature).
That’s enough
about older pitchers for now; let’s move on to the next group:
|
Rank |
Pitcher |
Years |
RSS |
WtSum |
Combo |
bWAR Rnk |
|
70 |
Mordecai
Brown |
1903-16 |
33.39 |
85.6 |
33.38 |
60 |
|
69 |
Adam
Wainwright |
2007-22 |
33.10 |
86.5 |
33.42 |
131 |
|
68 |
Rick Reuschel |
1972-91 |
32.81 |
87.3 |
33.43 |
31 |
|
67 |
Roy Oswalt |
2001-13 |
33.60 |
85.4 |
33.46 |
87 |
|
66 |
Dennis
Martinez |
1976-98 |
32.82 |
87.7 |
33.51 |
88 |
|
65 |
Chris Sale |
2012-22 |
34.61 |
83.3 |
33.54 |
111 |
|
64 |
Orel
Hershiser |
1984-2000 |
33.35 |
87.5 |
33.74 |
77 |
|
63 |
Jack Morris |
1977-94 |
33.56 |
89.0 |
34.14 |
123 |
|
62 |
Cliff Lee |
2002-14 |
34.78 |
86.0 |
34.16 |
134 |
|
61 |
Jon Lester |
2006-21 |
33.95 |
89.1 |
34.34 |
124 |
Did I say
that was enough about older pitchers? Of the nine seasons mentioned above in
which a pitcher led his league in both complete games and saves, Mordecai Brown
had two of them (1909-10).
All right,
elephant in the room time. Jack Morris is significantly higher in these
rankings than would likely be expected from a stats-based method. We’ve talked about how
the late ‘70s and early ‘80s were a low-deviation period, and Morris isn’t the
only pitcher who benefits from that (Vida Blue and Fernando Valenzuela have
also significantly outperformed their WAR rankings). But then how on earth is
Morris ahead of Rick Reuschel, a relative contemporary who WAR ranks 90-plus
spots in front of him?
If you rank
Morris and Reuschel’s best seasons by bWAR, Reuschel has six of the top seven.
Switching to fWAR doesn’t change much; Reuschel pulls five of the top six,
although the margins are narrower. GSDev, meanwhile, gives Reuschel two of the
top three… but Morris seven of the top ten.
The
difference seems to come down to three factors. First, there’s fielding
evaluation in peak seasons. GSDev lists 1983 as easily Morris’s best year, at
12.39. You can see the appeal; he led the AL in innings and strikeouts,
finished second in strikeout-to-walk ratio, and cracked the top 10 in ERA and
FIP (the latter number being a career-best 3.38). Detroit’s fielders graded out
well in ’83 (0.47 runs per 9 above average), but Morris seems to have benefited
less than most of his teammates (both by FIP-ERA disparity and by BABIP). While
GSDev ranks Morris’s ’83 as the fourth-best season in the majors that year,
bWAR has him tied for #21. On the other hand, Reuschel’s best season was 1977,
and his disparity runs the other way, with fielders 0.27 runs per 9 below
average, but an ERA 0.24 runs better than his already-excellent FIP. As a
result, bWAR sees Reuschel as the best pitcher in baseball in ’77, while GSDev
places him at #10. In Morris’s case, this type of disparity persists through
his career, and as a result, fWAR has him roughly a dozen wins higher than bWAR
(albeit still well behind Reuschel).
Second,
there’s in-game durability. Morris and Reuschel made nearly identical numbers
of regular season starts in their careers (527 and 529, respectively), but
Morris threw about 275 more innings, averaging 7.1 innings per start to
Reuschel’s 6.6. This serves as a counterbalance to Reuschel’s allowing fewer
runs (even when controlling for league context). Their career average adjusted
Game Scores were 53.0 for Morris and 53.5 for Reuschel, and about half of
Reuschel’s advantage is accounted for by their slightly different league
contexts.
The third factor is, as always, differences in league deviation. This is a bigger factor than you might expect, since Morris and Reuschel’s careers overlapped heavily. But Reuschel missed a lot of time in the early ’80s (18 combined starts from 1982-84), and thus didn’t benefit from as many low-deviation seasons. Also, the bits of their careers that didn’t overlap saw Reuschel in the fairly high-deviation early ‘70s, and Morris in the comparatively normal early ‘90s.
Side note –
if you were expecting me to bring up Morris’s postseason success, yes, that
probably is a small factor, especially because Reuschel struggled in the
playoffs. But ultimately the two of them combined for 20 postseason starts
compared to 1056 in the regular season, and while Morris certainly had his
moments, he also had a couple of less-impressive playoff outings. I’m confident
they would still be effectively tied on regular season record alone.
That’s quite a bit of commentary without mentioning the five post-2000 pitchers in this group, only one of whom is top-100 by bWAR. But I’m not sure there’s much to say beyond another refrain of the song we’ve been singing for this entire post; GSDev treats the lower volume of the modern pitching season more gently than WAR does. Also, as promised, this is where we find the other pitcher who’s improved his position by a substantial margin since 2022. Unlike Gerrit Cole, whose 2023 Cy Young felt like the natural conclusion to a long stretch of excellence, Chris Sale’s 2024 Cy was thoroughly unexpected, coming off of half a dozen injury-plagued years. But expected or no, Sale’s recent efforts should give him quite a boost once 2023-24 numbers are finalized.
Let’s finish
up this extra-long post with one more group, 60-51:
|
Rank |
Pitcher |
Years |
RSS |
WtSum |
Combo |
bWAR Rnk |
|
60 |
Billy Pierce |
1948-64 |
34.04 |
89.5 |
34.46 |
70 |
|
59 |
Jerry Koosman |
1967-85 |
33.91 |
90.6 |
34.63 |
59 |
|
58 |
Kevin Appier |
1989-2004 |
34.78 |
89.3 |
34.81 |
66 |
|
57 |
Ted Lyons |
1923-46 |
34.28 |
91.0 |
34.88 |
35 |
|
56 |
Andy Pettitte |
1995-2013 |
34.51 |
90.6 |
34.93 |
48 |
|
55 |
Ron Guidry |
1975-88 |
35.42 |
88.4 |
34.95 |
96 |
|
54 |
Eddie Plank |
1901-17 |
34.65 |
91.0 |
35.07 |
13 |
|
53 |
Steve Rogers |
1973-85 |
35.32 |
89.7 |
35.16 |
114 |
|
52 |
Whitey Ford |
1950-67 |
34.78 |
92.1 |
35.34 |
69 |
|
51 |
Cole Hamels |
2006-20 |
35.13 |
92.4 |
35.59 |
55 |
OK, we just
did a detailed one-to-one comparison, so I’ll spare you Steve Rogers vs.
Eddie Plank, which mostly boils down to Plank’s low peak (11 ace
seasons but only 3 in the top half) and the difference in deviation between
their eras, which we’ve gone through quite a few times already. It is notable
that Plank is the only member of the bWAR top 30 to fall short of the GSDev top 50, and
his bWAR rank of #13 clears that top-30 bar with room to spare. Meanwhile, in a mild
spoiler, Rogers is the highest-ranked pitcher in GSDev to miss the bWAR top
100.
Having
touched on the two pitchers whose evaluations diverge wildly between systems,
it’s worth pointing out that they are the exception for this group of pitchers;
half of this set of ten have GSDev and bWAR rankings within 10 places of each
other. That includes Jerry Koosman ranking exactly at #59 in both systems, one
of two pitchers in the top 100 of whom that is true (we’ll see the other one
higher up).
That takes us through the bottom half of the top 100. The scores look admittedly close together throughout the whole collection, with Hamels’s margin over Bumgarner (35.59 to 30.20) being less than 20%. So let’s close out by going back to seasonal ordinal rankings for another comparison. Here are each of our five groups of 10, listed by ace seasons, top-half and top-quartile ace seasons, and #1 seasons:
|
Ace |
THA |
TQA |
No1 |
|
|
100-91 |
54 |
33 |
17 |
2 |
|
90-81 |
68 |
41 |
21 |
3 |
|
80-71 |
60 |
37 |
24 |
5 |
|
70-61 |
71 |
53 |
22 |
2 |
|
60-51 |
81 |
45 |
20 |
4 |
If you look
from one group to the next, the difference isn’t always immediately obvious.
But if you contrast the groups on either end, it clarifies quite a bit
more. You can also match up pairs of contemporaries between the end groups
(Lyons vs. Derringer, Pettitte vs. Beckett, Koosman vs. Lolich, Hamels vs.
Bumgarner), and the advantages are tough to argue with in each case.
Still, though, we have yet another reminder of the relatively small differences you tend to find in the back half of a top-100 list. Next time, we’ll cover pitchers 50 through 11 and see if we can’t find a bit more breathing room between some all-time greats.
No comments:
Post a Comment