So far in this series, we’ve
used the GSDev
rating system
to examine the best starting pitching seasons
and careers
(over
the course
of three
parts) since the founding of the American League. Now, we’ll be
looking at the best pitchers year by year, and doing so in comparison to the
systems whose deficiencies
inspired us to take a fresh
look in the first
place: bWAR and fWAR.
We have 122 seasons of essentially complete GSDev data, stretching from 1901-2022. When considering the best pitcher according to all three systems, we can split those seasons into five categories: all systems disagree, GSDev and bWAR agree, GSDev and fWAR agree, bWAR and fWAR agree, and all systems agree. There’s a bit of a paradox in this analysis, in that the option that produces the most interesting efforts by the individual players is probably the least interesting to look at in comparing the systems. So let’s start with the years in which all three systems select the same best pitcher. There are 46 such seasons:
|
Year |
Pitcher |
GSDev |
bWAR |
fWAR |
|
1901 |
Cy Young |
14.43 |
12.4 |
7.8 |
|
1902 |
Cy Young |
13.01 |
10.1 |
7.7 |
|
1911 |
Ed Walsh |
14.37 |
9.2 |
7.6 |
|
1912 |
Walter Johnson |
17.40 |
14.3 |
9.3 |
|
1913 |
Walter Johnson |
18.48 |
15.2 |
8.5 |
|
1918 |
Walter Johnson |
15.95 |
10.5 |
6.5 |
|
1919 |
Walter Johnson |
15.46 |
10.8 |
6.8 |
|
1923 |
Dolf Luque |
16.22 |
10.7 |
6.7 |
|
1924 |
Dazzy Vance |
18.82 |
10.5 |
7.7 |
|
1928 |
Dazzy Vance |
16.81 |
10.1 |
6.9 |
|
1930 |
Lefty Grove |
15.98 |
10.4 |
8.3 |
|
1931 |
Lefty Grove |
18.83 |
10.4 |
7.3 |
|
1932 |
Lefty Grove |
15.57 |
9.5 |
7.0 |
|
1934 |
Dizzy Dean |
15.85 |
8.9 |
6.5 |
|
1939 |
Bob Feller |
16.92 |
9.2 |
6.5 |
|
1942 |
Mort Cooper |
14.44 |
8.2 |
6.5 |
|
1943 |
Spud Chandler |
15.37 |
6.4 |
6.3 |
|
1944 |
Dizzy Trout |
14.74 |
9.3 |
7.3 |
|
1945 |
Hal Newhouser |
17.17 |
11.3 |
8.0 |
|
1948 |
Harry Brecheen |
14.05 |
8.7 |
7.7 |
|
1949 |
Mel Parnell |
13.56 |
8.0 |
6.9 |
|
1950 |
Ewell Blackwell |
11.91 |
7.5 |
6.4 |
|
1951 |
Robin Roberts |
12.61 |
8.0 |
6.7 |
|
1953 |
Robin Roberts |
15.98 |
9.8 |
8.4 |
|
1954 |
Robin Roberts |
14.56 |
9.0 |
7.1 |
|
1957 |
Frank Sullivan |
13.37 |
6.4 |
6.4 |
|
1959 |
Camilo Pascual |
13.78 |
7.8 |
7.6 |
|
1963 |
Sandy Koufax |
18.12 |
10.7 |
9.2 |
|
1966 |
Sandy Koufax |
18.49 |
10.3 |
9.1 |
|
1968 |
Bob Gibson |
20.75 |
11.2 |
8.6 |
|
1970 |
Bob Gibson |
15.68 |
8.9 |
9.8 |
|
1972 |
Steve Carlton |
18.49 |
12.1 |
11.1 |
|
1980 |
Steve Carlton |
21.25 |
10.2 |
8.8 |
|
1985 |
Dwight Gooden |
19.5 |
12.2 |
8.9 |
|
1987 |
Roger Clemens |
17.11 |
9.4 |
8.4 |
|
1989 |
Bret Saberhagen |
16.47 |
9.7 |
7.5 |
|
1990 |
Roger Clemens |
15.87 |
10.4 |
8.2 |
|
1994 |
Greg Maddux |
17.35 |
8.5 |
7.4 |
|
1997 |
Roger Clemens |
20.41 |
11.9 |
10.7 |
|
1998 |
Kevin Brown |
17.77 |
8.6 |
9.6 |
|
1999 |
Pedro Martinez |
22.60 |
9.8 |
11.6 |
|
2001 |
Randy Johnson |
22.14 |
10.1 |
10.4 |
|
2006 |
Johan Santana |
16.05 |
7.6 |
6.7 |
|
2009 |
Zack Greinke |
17.04 |
10.4 |
8.7 |
|
2012 |
Justin Verlander |
16.18 |
8.1 |
6.9 |
|
2013 |
Clayton Kershaw |
16.96 |
8.1 |
7.2 |
Unsurprisingly, that list contains some pretty terrific seasons. It also includes several years in which the victorious effort was relatively unimpressive, but the competition was even worse. Still, the nine pitchers with multiple unanimous wins are an impressive group: four for Walter
Johnson, three for Roger Clemens, Lefty Grove, and Robin Roberts, and two
apiece for Bob Gibson, Steve Carlton, Dazzy Vance, Cy Young, and Sandy Koufax.
And of course, the greatest seasons ever have considerable representation on
the list; among the top 27 GSDev scores to date (anything over 18), we see W. Johnson ’12 and ’13, Vance ’24, Grove ’31, Koufax ’63 and ’66,
Gibson ’68, Carlton ’72 and ’80, Gooden ’85, Clemens ’97, Pedro ’99, and R.
Johnson ’01.
That list,
however, is not complete. Out of the 23 league-leading GSDev totals that
exceeded 18, only 14 were unanimously acclaimed as the best pitcher in
baseball. Notable omissions include Ron Guidry 1978, Grover Cleveland Alexander
1915, Greg Maddux 1995, Sandy Koufax 1965, and most startling of all, Pedro
Martinez 2000 – the highest single-season GSDev score on record.
That brings
us to the topic of disagreements. With 46 seasons of unanimity, that leaves 76
years of squabbling (including every season since 2013 – a streak that has
continued through 2025, as bWAR and fWAR don’t agree in any of the years for which we
don’t yet have final GSDev numbers). How do those seasons break down?
|
Disagreer |
Years |
|
bWAR |
25 |
|
fWAR |
20 |
|
GSDev |
10 |
|
Everyone |
21 |
That’s a
relatively even split between bWAR and fWAR, a reasonable chunk of unanimous
agreement to disagree… and remarkably few in which GSDev is the lone
holdout. Let’s go through these categories one at a time, starting with our most common
objector.
|
Year |
fWAR/GSDev Choice |
bWAR Rnk |
bWAR Choice |
|
1909 |
Mordecai Brown |
2 |
Christy Mathewson |
|
1915 |
Grover Alexander |
2 |
Walter Johnson |
|
1916 |
Walter Johnson |
2 |
Grover Alexander |
|
1920 |
Stan Coveleski |
3 |
Grover Alexander |
|
1941 |
Whit Wyatt |
3 |
Thornton Lee |
|
1947 |
Ewell Blackwell |
2 |
Warren Spahn |
|
1956 |
Herb Score |
2 |
Early Wynn |
|
1958 |
Sam Jones |
2 |
Frank Lary |
|
1965 |
Sandy Koufax |
4 |
Juan Marichal |
|
1978 |
Ron Guidry |
2 |
Phil Niekro |
|
1979 |
JR Richard |
8 |
Phil Niekro |
|
1984 |
Dwight Gooden |
4 |
Dave Stieb |
|
1986 |
Mike Scott |
3 |
Teddy Higuera |
|
1988 |
Roger Clemens |
5 |
Mark Gubicza |
|
1991 |
Roger Clemens |
2 |
Tom Glavine |
|
1996 |
John Smoltz |
4 |
Pat Hentgen |
|
2002 |
Curt Schilling |
2 |
Randy Johnson |
|
2004 |
Randy Johnson |
2 |
Johan Santana |
|
2005 |
Johan Santana |
3 |
Roger Clemens |
|
2014 |
Clayton Kershaw |
2 |
Corey Kluber |
|
2015 |
Clayton Kershaw |
3 |
Zack Greinke |
|
2016 |
Clayton Kershaw* |
3 |
Justin Verlander |
|
2018 |
Jacob deGrom |
2 |
Aaron Nola |
|
2019 |
Gerrit Cole |
5 |
Mike Minor |
|
2021 |
Corbin Burnes |
10 |
Zack Wheeler |
That’s 25
years in which bWAR was the lone holdout (depending on how you look at 2016, in
which Kershaw was tied for first in fWAR with Jose Fernandez). In twelve of
them, the fWAR/GSDev choice ranked second, which is generally understandable. Throw in another six third-place finishes, which usually aren’t too out there. (Even the near misses are amusing at times, like with the Alexander/Johnson
switcheroo in 1915-16, or the Schilling-Johnson-Santana-Clemens ring-around-the-rosie
in the early 2000s.)
There are some oddballs, though. We’ve talked about 2019 Gerrit Cole and 2021 Corbin Burnes during the early posts in this series; those were two of the bigger bWAR discrepancies of the bunch. As far as the other fourth-and-below finishes go: Smoltz in ’96 is a fairly standard bWAR/fWAR tiff (FIP lower than ERA despite a good-fielding team), with a side of brilliant postseason work. Despite finishing in fifth place, Clemens in ’88 is a surprisingly close contender, less than a win back of first. Gooden in ’84 had a shocking-for-the-time 11.4 K/9 in his rookie year, thereby producing one of the best FIP seasons ever; his ERA was nearly a run higher.
Richard in
’79 surprises me. Yes, his FIP is lower than his ERA by half a run – but he
still led the majors in ERA. Houston was a pitcher’s park, but not a crazy one
(Richard’s personal park factor for the year was 93, according to B-R, higher
than any he’d had since 1975). bWAR thinks the Astros’ defense was good that
year, but Richard’s BABIP allowed was barely different from average (and that
difference probably came from the park factor). I can understand not having him
in first, but #8 (and nearly 2 wins out of first) is a leap.
And then
there’s the big one. Sandy Koufax 1965, the #4 season ever by GSDev, and #7 in
the sample according to fWAR, ranks fourth for the year in bWAR. Admittedly,
his margin behind Sam McDowell and Jim Maloney is miniscule. But Juan Marichal
bests him by 10.3 bWAR to 8.1. Marichal’s 1965 is quite a season in its own
right, one of the top 60 of all time by GSDev. Koufax won the ERA title, but
Marichal only trailed him by 0.09, and was in a tougher park; he led the majors
in ERA+. However, Koufax threw 40 more innings (not even counting the
playoffs). You’d think they’d at least be close. The difference? According to
bWAR, the Giants had an average defense, while LA’s was phenomenal. This was
not typical; for most of Koufax’s best years, the Dodger fielders grade as
average or below. From 1961-66, Koufax’s fielding adjustment per 9 innings in
bWAR goes: -0.20, -0.07, -0.15, 0.03, 0.30, -0.07. Now, if you look at Koufax’s
BABIP in ’65, you might buy it; his .238 mark was 40 points better than league
average. Marichal’s BABIP, by contrast, was .238. Which is not actually a
contrast, given that it’s the same number.
This is all
having the discussion on bWAR’s terms; we’re ignoring Koufax’s 382 strikeouts,
1.93 FIP, and phenomenal World Series performance. Even looking at the things
that bWAR looks at, I don’t agree that there are two wins of margin in
Marichal’s favor here.
Having dumped
on one WAR system, let’s switch to the other! Here are the 20 seasons in which
fWAR is the disagreeable option:
|
Year |
bWAR/GSDev Choice |
fWAR Rnk |
fWAR Choice |
|
1914 |
Walter Johnson |
2 |
Cy Falkenberg |
|
1917 |
Eddie Cicotte |
3 |
Grover Alexander |
|
1921 |
Red Faber |
3 |
Stan Coveleski |
|
1922 |
Red Faber |
2 |
Urban Shocker |
|
1926 |
George Uhle |
2 |
Lefty Grove |
|
1933 |
Carl Hubbell |
2 |
Dizzy Dean |
|
1940 |
Bob Feller |
2 |
Ray Brown |
|
1952 |
Bobby Shantz |
2 |
Robin Roberts |
|
1955 |
Billy Pierce |
2 |
Bob Rush |
|
1967 |
Jim Bunning |
2 |
Dean Chance |
|
1969 |
Bob Gibson |
2 |
Sam McDowell |
|
1973 |
Tom Seaver |
4 |
Bert Blyleven |
|
1975 |
Jim Palmer |
4 |
Tom Seaver |
|
1992 |
Greg Maddux |
3 |
Roger Clemens |
|
1993 |
Kevin Appier |
4 |
Greg Maddux |
|
1995 |
Greg Maddux |
2 |
Randy Johnson |
|
2000 |
Pedro Martinez |
2 |
Randy Johnson |
|
2010 |
Roy Halladay |
4 |
Cliff Lee |
|
2017 |
Corey Kluber |
2 |
Chris Sale |
|
2022 |
Sandy Alcantara |
4 |
Aaron Nola |
The fWAR
table looks a bit more under control than the bWAR table; none of the consensus
top finishers from the other two systems land outside the top 4, and 12 of the
20 finish #2. Of those, the obvious one to highlight is Unit over Pedro in
2000, a year in which Martinez had the single highest GSDev score to date. Johnson
had more innings, but not by a huge margin (248.2 to 217), and Pedro led the
majors in FIP and a number of related categories. Johnson’s year was excellent
as well (he led the NL in FIP and ERA+, and the majors in strikeouts and
strikeout rate), but Pedro beat him by 36 points in FIP despite being in the DH league. To put it mildly, this outcome is a surprise for me.
The other
standout seasons are the years in which the fWAR fourth-place finisher leads in
the other two systems. There’s not necessarily going to be much to say here in
most cases. Take 1973, Seaver vs. Blyleven. Seaver has a better ERA (and
allowed just 7 unearned runs to Blyleven’s 18), and pitched extremely well in the
playoffs. Blyleven had a better FIP and pitched 35 more innings in the regular
season (Seaver’s postseason makes up most of that difference as well). Seaver
vs. Palmer two years later puts Tom Terrific on the other side of the exchange;
Palmer had a 2.09 ERA but a 2.96 FIP that year. Appier vs. Maddux? Appier’s RA
was lower despite being in the DH league; Maddux’s FIP was lower and he pitched
more innings. (Maddux also struggled in the postseason that year, which doesn’t
interest fWAR at all.) Halladay/Lee and Alcantara/Nola are more of the same,
with FIP/ERA differences in opposite directions. The 2010 Halladay/Lee race was
incredibly close in GSDev, as were some of the others listed here (Cicotte/Alexander
in 1917, Maddux/Clemens in 1992).
Speaking of close races, we should mention Shantz vs. Roberts in 1952. GSDev has the margin between
those two pitchers as 0.01, the narrowest in any season. 1952 was roughly one
wild pitch away from being in the bWAR table instead of this one. (As it
happens, Shantz threw 0 wild pitches in 1952, and Roberts threw 2; that was
enough to swing the results.)
On to our
final group for this post: years in which bWAR and fWAR agree on the best
pitcher, and GSDev dissents.
|
Year |
bWAR/fWAR Choice |
GSDev Rnk |
GSDev Choice |
|
1904 |
Rube Waddell |
2 |
Jack Chesbro |
|
1907 |
Christy Mathewson |
2 |
Cy Young |
|
1908 |
Christy Mathewson |
2 |
Ed Walsh |
|
1925 |
Bullet Rogan |
NA |
Dazzy Vance |
|
1935 |
Lefty Grove |
2 |
Cy Blanton |
|
1937 |
Lefty Grove |
2 |
Lefty Gomez |
|
1946 |
Bob Feller |
2 |
Hal Newhouser |
|
1964 |
Dean Chance |
2 |
Don Drysdale |
|
2011 |
Roy Halladay |
2 |
Justin Verlander |
|
2020 |
Shane Bieber |
2 |
Trevor Bauer |
The obvious
standout here is Negro League great Bullet Rogan in 1925, who I don’t have the
data to rank via GSDev. Given the agreement between bWAR and fWAR on Rogan’s
prowess and the fact that neither Vance nor anyone else in MLB had a
particularly exceptional year, I’m reasonably confident that Rogan should
indeed be in the #1 spot.
Outside of
that, there are nine seasons in which both WAR systems agree on the best
pitcher and GSDev does not. In all nine cases, GSDev ranks the WAR consensus in
the #2 spot. That is both fewer and smaller discrepancies than have arisen from
the other two systems – but that makes sense, because GSDev is to some extent a
compromise between the two WARs, accounting for factors used by both of them.
So what causes the differences?
The most
obvious option is one we’ve brought up frequently throughout the series:
the postseason, which GSDev counts and both WARs ignore. This is not as much of
a factor in these particular years as might be expected, for a straightforward reason:
of the 18 pitchers in question, only 5 reached the postseason. Those were Gomez
in 1937, Verlander and Halladay in 2011, and Bauer and Bieber in 2020. Working in reverse order: 2020 was
definitely decided by the postseason; Bieber had his worst start of the year in
his only playoff outing, while Bauer’s lone playoff start was his best of the
shortened campaign. 2011 has the opposite effect; Halladay’s two excellent
postseason starts brought him up quite a bit, while Verlander’s four mediocre
outings didn’t move his score at all. Meanwhile, in ’37, Gomez does lose two
very good World Series starts if you ignore the playoffs… and his margin over
Grove is so big that he easily wins anyway.
So, we’ve
accounted for one season out of nine. Let’s run through the others chronologically.
1904, Chesbro
vs. Waddell. For two deadball aces, you’d think relief work would be a factor;
it is, but only to a small extent. Waddell had no relief outings this year;
Chesbro had four (allowing 6 runs in 10.2 innings), compared to 51 starts (48
of which he completed). Waddell had 349 strikeouts, a total that nobody would match for over 60 years, but he also allowed more
walks, homers, and hit batters than Chesbro in 60 fewer innings as a starter.
Waddell’s per-inning rates of runs and hits allowed were also higher. Chesbro’s
sets of parks and opponents were a bit tougher, and he recorded about one extra
out per start. The extra 100-plus strikeouts (in five fewer starts) are a big
deficit to make up; GSDev thinks Chesbro does enough.
1907, Young
vs. Mathewson. These two had a bit more relief work – but Young, GSDev’s choice as the superior starter, also pitched better in relief. Mathewson holds a slight lead in raw
Game Score, 71.9 to 71.3, but he was in the lower-scoring NL, and avoided
facing the second highest-scoring team in the league (because he played for
them). Young’s Boston Americans, meanwhile, sported the AL’s feeblest lineup. The
environmental adjustment boosts Young’s per-game numbers above Matty’s, and he
had an extra start on top of that.
1908, Walsh
vs. Mathewson. I promise GSDev doesn’t hate Christy Mathewson; it does give him
a couple of #1 finishes that we’ll discuss in the next post. For now, though, we
finally get some relevant bullpen work to examine. This time, it very well may make a
substantive difference, as Mathewson’s 12 relief appearances added up to 28
innings and only 5 runs allowed, while Walsh’s 17 bullpen efforts comprised 38
innings and 18 runs. Removing his comparatively ineffective relief outings nudges Walsh’s RA
as a starter just below Mathewson’s. Matty allowed fewer hits and walks per
inning; he gave up 5 homers to Walsh’s 1, but Walsh allowed far more hit
batters and wild pitches. And this time, it’s Walsh who had the more
pitcher-friendly environment. The difference between them on a per-start basis
is razor-thin… but Walsh had five more starts, and thus pulls ahead.
1935, Blanton
vs. Grove. Relief is a factor here as well; the pitchers made five relief
outings each, but Blanton allowed six runs in his 9.1 bullpen innings to
Grove’s three in 12. The rest of the difference is hard to assess, largely
because Grove and Blanton pitched in different leagues, and there was a large,
persistent difference between the leagues in the ‘30s, with the AL being
notably higher-scoring every year. GSDev adjusts for this, of course; Grove’s
average park-opponent combo was 0.41 runs per game higher than Blanton’s. That
adjustment makes up nearly 60% of the difference in their raw per-game numbers,
but that still leaves Blanton ahead. It may genuinely be the relief outings
making up the whole difference here.
1937, Gomez
vs. Grove. Most of these matchups are pretty close, as you might expect from seasons
in which GSDev disagrees with a WAR consensus. This one? GSDev has this as a
blowout. So what’s up? Well, first things first; neither pitcher made a relief
appearance all year. Gomez had the better ERA (2.33 to 3.02) and FIP (3.29 to
3.44), leading the league in both categories. He also led the league in hits
per 9, strikeouts, strikeouts per 9, strikeout to walk ratio, and shutouts. He
had two more regular season starts than Grove, and their average innings per start were nearly identical. Grove was in tougher parks, and Gomez ducked
the supremely excellent Yankee lineup; accounting for this only narrows his Game Score advantage from 5.9 to 3.5. And that’s before mentioning Gomez’s two
complete game wins in the World Series. Sometimes it’s a challenge to figure
out what GSDev is thinking; this is not one of those times.
1946,
Newhouser vs. Feller. This is a close race in all three systems. It’s also
relatively easy to explain. Newhouser was a better pitcher on a per-game basis;
he led the AL in both ERA and FIP despite working in a hitters’ park while
Feller benefited from the best pitchers’ park in the game. Feller made 8 more starts,
totaling 70 additional innings. Both pitchers fare spectacularly well in all three
systems, but GSDev narrowly prefers Newhouser’s per-game excellence, while WAR
leans toward Feller’s higher volume.
1964, Drysdale vs. Chance. In the season that inverts the last two digits of 1946, the race’s outcome is also inverted. This time, it’s Chance who was better on a per-game basis, and Drysdale who had the extra starts. GSDev picking Drysdale honestly surprises me. Chance’s ERA and FIP were both better, despite being in a similar park in the higher-scoring league. Chance had 11 relief outings, but pitched better as a starter than as a reliever. The other little things that might result in a swing (unearned runs, wild pitches, hit batters) are all in Chance’s favor as well. The only thing keeping Drysdale’s per-start numbers close are a higher inning count (8 innings per start to 7.3) and a lower walk rate (1.7 per start to 2.2). Ultimately, Chance leads Drysdale in average adjusted Game Score, 64.9 to 64.1, but that margin is close enough for Drysdale’s 40-35 lead in starts to make the difference. (As a bonus note, Sandy Koufax blew past both of them in average Game Score, with a 67.3, but in only 28 starts.)
2011,
Verlander vs. Halladay. If this was regular-season only, Verlander would win
easily; his average adjusted Game Score was higher and he had two additional
starts. This lead was likely because of something that has been mentioned a
couple of times so far without really being highlighted: hits allowed. GSDev is
the only one of these systems that considers hits allowed as a factor. Even
including the postseason (in which Halladay pitched well and Verlander
struggled somewhat, although he did extend his advantage in starts), Verlander
allowed 6.4 hits per 9 innings to Halladay’s 7.8. That was enough to keep pace
with Halladay’s superior fielding-independent numbers, which in turn allowed
the extra starts to make the difference.
And that’s
all ten of GSDev’s dissents explored. Next time, we’ll go through the remaining
category: the 21 seasons in which GSDev, bWAR, and fWAR have all agreed to
disagree on the identity of baseball’s best pitcher.