SABER All I Want To: Starting Pitcher Ratings: Starting Pitcher of the Year

So far in this series, we’ve used the GSDev rating system to examine the best starting pitching seasons and careers (over the course of three parts) since the founding of the American League. Now, we’ll be looking at the best pitchers year by year, and doing so in comparison to the systems whose deficiencies inspired us to take a fresh look in the first place: bWAR and fWAR.

We have 122 seasons of essentially complete GSDev data, stretching from 1901-2022. When considering the best pitcher according to all three systems, we can split those seasons into five categories: all systems disagree, GSDev and bWAR agree, GSDev and fWAR agree, bWAR and fWAR agree, and all systems agree. There’s a bit of a paradox in this analysis, in that the option that produces the most interesting efforts by the individual players is probably the least interesting to look at in comparing the systems. So let’s start with the years in which all three systems select the same best pitcher. There are 46 such seasons:

Year	Pitcher	GSDev	bWAR	fWAR
1901	Cy Young	14.43	12.4	7.8
1902	Cy Young	13.01	10.1	7.7
1911	Ed Walsh	14.37	9.2	7.6
1912	Walter Johnson	17.40	14.3	9.3
1913	Walter Johnson	18.48	15.2	8.5
1918	Walter Johnson	15.95	10.5	6.5
1919	Walter Johnson	15.46	10.8	6.8
1923	Dolf Luque	16.22	10.7	6.7
1924	Dazzy Vance	18.82	10.5	7.7
1928	Dazzy Vance	16.81	10.1	6.9
1930	Lefty Grove	15.98	10.4	8.3
1931	Lefty Grove	18.83	10.4	7.3
1932	Lefty Grove	15.57	9.5	7.0
1934	Dizzy Dean	15.85	8.9	6.5
1939	Bob Feller	16.92	9.2	6.5
1942	Mort Cooper	14.44	8.2	6.5
1943	Spud Chandler	15.37	6.4	6.3
1944	Dizzy Trout	14.74	9.3	7.3
1945	Hal Newhouser	17.17	11.3	8.0
1948	Harry Brecheen	14.05	8.7	7.7
1949	Mel Parnell	13.56	8.0	6.9
1950	Ewell Blackwell	11.91	7.5	6.4
1951	Robin Roberts	12.61	8.0	6.7
1953	Robin Roberts	15.98	9.8	8.4
1954	Robin Roberts	14.56	9.0	7.1
1957	Frank Sullivan	13.37	6.4	6.4
1959	Camilo Pascual	13.78	7.8	7.6
1963	Sandy Koufax	18.12	10.7	9.2
1966	Sandy Koufax	18.49	10.3	9.1
1968	Bob Gibson	20.75	11.2	8.6
1970	Bob Gibson	15.68	8.9	9.8
1972	Steve Carlton	18.49	12.1	11.1
1980	Steve Carlton	21.25	10.2	8.8
1985	Dwight Gooden	19.5	12.2	8.9
1987	Roger Clemens	17.11	9.4	8.4
1989	Bret Saberhagen	16.47	9.7	7.5
1990	Roger Clemens	15.87	10.4	8.2
1994	Greg Maddux	17.35	8.5	7.4
1997	Roger Clemens	20.41	11.9	10.7
1998	Kevin Brown	17.77	8.6	9.6
1999	Pedro Martinez	22.60	9.8	11.6
2001	Randy Johnson	22.14	10.1	10.4
2006	Johan Santana	16.05	7.6	6.7
2009	Zack Greinke	17.04	10.4	8.7
2012	Justin Verlander	16.18	8.1	6.9
2013	Clayton Kershaw	16.96	8.1	7.2

Unsurprisingly, that list contains some pretty terrific seasons. It also includes several years in which the victorious effort was relatively unimpressive, but the competition was even worse. Still, the nine pitchers with multiple unanimous wins are an impressive group: four for Walter Johnson, three for Roger Clemens, Lefty Grove, and Robin Roberts, and two apiece for Bob Gibson, Steve Carlton, Dazzy Vance, Cy Young, and Sandy Koufax. And of course, the greatest seasons ever have considerable representation on the list; among the top 27 GSDev scores to date (anything over 18), we see W. Johnson ’12 and ’13, Vance ’24, Grove ’31, Koufax ’63 and ’66, Gibson ’68, Carlton ’72 and ’80, Gooden ’85, Clemens ’97, Pedro ’99, and R. Johnson ’01.

That list, however, is not complete. Out of the 23 league-leading GSDev totals that exceeded 18, only 14 were unanimously acclaimed as the best pitcher in baseball. Notable omissions include Ron Guidry 1978, Grover Cleveland Alexander 1915, Greg Maddux 1995, Sandy Koufax 1965, and most startling of all, Pedro Martinez 2000 – the highest single-season GSDev score on record.

That brings us to the topic of disagreements. With 46 seasons of unanimity, that leaves 76 years of squabbling (including every season since 2013 – a streak that has continued through 2025, as bWAR and fWAR don’t agree in any of the years for which we don’t yet have final GSDev numbers). How do those seasons break down?

Disagreer	Years
bWAR	25
fWAR	20
GSDev	10
Everyone	21

That’s a relatively even split between bWAR and fWAR, a reasonable chunk of unanimous agreement to disagree… and remarkably few in which GSDev is the lone holdout. Let’s go through these categories one at a time, starting with our most common objector.

Year	fWAR/GSDev Choice	bWAR Rnk	bWAR Choice
1909	Mordecai Brown	2	Christy Mathewson
1915	Grover Alexander	2	Walter Johnson
1916	Walter Johnson	2	Grover Alexander
1920	Stan Coveleski	3	Grover Alexander
1941	Whit Wyatt	3	Thornton Lee
1947	Ewell Blackwell	2	Warren Spahn
1956	Herb Score	2	Early Wynn
1958	Sam Jones	2	Frank Lary
1965	Sandy Koufax	4	Juan Marichal
1978	Ron Guidry	2	Phil Niekro
1979	JR Richard	8	Phil Niekro
1984	Dwight Gooden	4	Dave Stieb
1986	Mike Scott	3	Teddy Higuera
1988	Roger Clemens	5	Mark Gubicza
1991	Roger Clemens	2	Tom Glavine
1996	John Smoltz	4	Pat Hentgen
2002	Curt Schilling	2	Randy Johnson
2004	Randy Johnson	2	Johan Santana
2005	Johan Santana	3	Roger Clemens
2014	Clayton Kershaw	2	Corey Kluber
2015	Clayton Kershaw	3	Zack Greinke
2016	Clayton Kershaw*	3	Justin Verlander
2018	Jacob deGrom	2	Aaron Nola
2019	Gerrit Cole	5	Mike Minor
2021	Corbin Burnes	10	Zack Wheeler

That’s 25 years in which bWAR was the lone holdout (depending on how you look at 2016, in which Kershaw was tied for first in fWAR with Jose Fernandez). In twelve of them, the fWAR/GSDev choice ranked second, which is generally understandable. Throw in another six third-place finishes, which usually aren’t too out there. (Even the near misses are amusing at times, like with the Alexander/Johnson switcheroo in 1915-16, or the Schilling-Johnson-Santana-Clemens ring-around-the-rosie in the early 2000s.)

There are some oddballs, though. We’ve talked about 2019 Gerrit Cole and 2021 Corbin Burnes during the early posts in this series; those were two of the bigger bWAR discrepancies of the bunch. As far as the other fourth-and-below finishes go: Smoltz in ’96 is a fairly standard bWAR/fWAR tiff (FIP lower than ERA despite a good-fielding team), with a side of brilliant postseason work. Despite finishing in fifth place, Clemens in ’88 is a surprisingly close contender, less than a win back of first. Gooden in ’84 had a shocking-for-the-time 11.4 K/9 in his rookie year, thereby producing one of the best FIP seasons ever; his ERA was nearly a run higher.

Richard in ’79 surprises me. Yes, his FIP is lower than his ERA by half a run – but he still led the majors in ERA. Houston was a pitcher’s park, but not a crazy one (Richard’s personal park factor for the year was 93, according to B-R, higher than any he’d had since 1975). bWAR thinks the Astros’ defense was good that year, but Richard’s BABIP allowed was barely different from average (and that difference probably came from the park factor). I can understand not having him in first, but #8 (and nearly 2 wins out of first) is a leap.

And then there’s the big one. Sandy Koufax 1965, the #4 season ever by GSDev, and #7 in the sample according to fWAR, ranks fourth for the year in bWAR. Admittedly, his margin behind Sam McDowell and Jim Maloney is miniscule. But Juan Marichal bests him by 10.3 bWAR to 8.1. Marichal’s 1965 is quite a season in its own right, one of the top 60 of all time by GSDev. Koufax won the ERA title, but Marichal only trailed him by 0.09, and was in a tougher park; he led the majors in ERA+. However, Koufax threw 40 more innings (not even counting the playoffs). You’d think they’d at least be close. The difference? According to bWAR, the Giants had an average defense, while LA’s was phenomenal. This was not typical; for most of Koufax’s best years, the Dodger fielders grade as average or below. From 1961-66, Koufax’s fielding adjustment per 9 innings in bWAR goes: -0.20, -0.07, -0.15, 0.03, 0.30, -0.07. Now, if you look at Koufax’s BABIP in ’65, you might buy it; his .238 mark was 40 points better than league average. Marichal’s BABIP, by contrast, was .238. Which is not actually a contrast, given that it’s the same number.

This is all having the discussion on bWAR’s terms; we’re ignoring Koufax’s 382 strikeouts, 1.93 FIP, and phenomenal World Series performance. Even looking at the things that bWAR looks at, I don’t agree that there are two wins of margin in Marichal’s favor here.

Having dumped on one WAR system, let’s switch to the other! Here are the 20 seasons in which fWAR is the disagreeable option:

Year	bWAR/GSDev Choice	fWAR Rnk	fWAR Choice
1914	Walter Johnson	2	Cy Falkenberg
1917	Eddie Cicotte	3	Grover Alexander
1921	Red Faber	3	Stan Coveleski
1922	Red Faber	2	Urban Shocker
1926	George Uhle	2	Lefty Grove
1933	Carl Hubbell	2	Dizzy Dean
1940	Bob Feller	2	Ray Brown
1952	Bobby Shantz	2	Robin Roberts
1955	Billy Pierce	2	Bob Rush
1967	Jim Bunning	2	Dean Chance
1969	Bob Gibson	2	Sam McDowell
1973	Tom Seaver	4	Bert Blyleven
1975	Jim Palmer	4	Tom Seaver
1992	Greg Maddux	3	Roger Clemens
1993	Kevin Appier	4	Greg Maddux
1995	Greg Maddux	2	Randy Johnson
2000	Pedro Martinez	2	Randy Johnson
2010	Roy Halladay	4	Cliff Lee
2017	Corey Kluber	2	Chris Sale
2022	Sandy Alcantara	4	Aaron Nola

The fWAR table looks a bit more under control than the bWAR table; none of the consensus top finishers from the other two systems land outside the top 4, and 12 of the 20 finish #2. Of those, the obvious one to highlight is Unit over Pedro in 2000, a year in which Martinez had the single highest GSDev score to date. Johnson had more innings, but not by a huge margin (248.2 to 217), and Pedro led the majors in FIP and a number of related categories. Johnson’s year was excellent as well (he led the NL in FIP and ERA+, and the majors in strikeouts and strikeout rate), but Pedro beat him by 36 points in FIP despite being in the DH league. To put it mildly, this outcome is a surprise for me.

The other standout seasons are the years in which the fWAR fourth-place finisher leads in the other two systems. There’s not necessarily going to be much to say here in most cases. Take 1973, Seaver vs. Blyleven. Seaver has a better ERA (and allowed just 7 unearned runs to Blyleven’s 18), and pitched extremely well in the playoffs. Blyleven had a better FIP and pitched 35 more innings in the regular season (Seaver’s postseason makes up most of that difference as well). Seaver vs. Palmer two years later puts Tom Terrific on the other side of the exchange; Palmer had a 2.09 ERA but a 2.96 FIP that year. Appier vs. Maddux? Appier’s RA was lower despite being in the DH league; Maddux’s FIP was lower and he pitched more innings. (Maddux also struggled in the postseason that year, which doesn’t interest fWAR at all.) Halladay/Lee and Alcantara/Nola are more of the same, with FIP/ERA differences in opposite directions. The 2010 Halladay/Lee race was incredibly close in GSDev, as were some of the others listed here (Cicotte/Alexander in 1917, Maddux/Clemens in 1992).

Speaking of close races, we should mention Shantz vs. Roberts in 1952. GSDev has the margin between those two pitchers as 0.01, the narrowest in any season. 1952 was roughly one wild pitch away from being in the bWAR table instead of this one. (As it happens, Shantz threw 0 wild pitches in 1952, and Roberts threw 2; that was enough to swing the results.)

On to our final group for this post: years in which bWAR and fWAR agree on the best pitcher, and GSDev dissents.

Year	bWAR/fWAR Choice	GSDev Rnk	GSDev Choice
1904	Rube Waddell	2	Jack Chesbro
1907	Christy Mathewson	2	Cy Young
1908	Christy Mathewson	2	Ed Walsh
1925	Bullet Rogan	NA	Dazzy Vance
1935	Lefty Grove	2	Cy Blanton
1937	Lefty Grove	2	Lefty Gomez
1946	Bob Feller	2	Hal Newhouser
1964	Dean Chance	2	Don Drysdale
2011	Roy Halladay	2	Justin Verlander
2020	Shane Bieber	2	Trevor Bauer

The obvious standout here is Negro League great Bullet Rogan in 1925, who I don’t have the data to rank via GSDev. Given the agreement between bWAR and fWAR on Rogan’s prowess and the fact that neither Vance nor anyone else in MLB had a particularly exceptional year, I’m reasonably confident that Rogan should indeed be in the #1 spot.

Outside of that, there are nine seasons in which both WAR systems agree on the best pitcher and GSDev does not. In all nine cases, GSDev ranks the WAR consensus in the #2 spot. That is both fewer and smaller discrepancies than have arisen from the other two systems – but that makes sense, because GSDev is to some extent a compromise between the two WARs, accounting for factors used by both of them. So what causes the differences?

The most obvious option is one we’ve brought up frequently throughout the series: the postseason, which GSDev counts and both WARs ignore. This is not as much of a factor in these particular years as might be expected, for a straightforward reason: of the 18 pitchers in question, only 5 reached the postseason. Those were Gomez in 1937, Verlander and Halladay in 2011, and Bauer and Bieber in 2020. Working in reverse order: 2020 was definitely decided by the postseason; Bieber had his worst start of the year in his only playoff outing, while Bauer’s lone playoff start was his best of the shortened campaign. 2011 has the opposite effect; Halladay’s two excellent postseason starts brought him up quite a bit, while Verlander’s four mediocre outings didn’t move his score at all. Meanwhile, in ’37, Gomez does lose two very good World Series starts if you ignore the playoffs… and his margin over Grove is so big that he easily wins anyway.

So, we’ve accounted for one season out of nine. Let’s run through the others chronologically.

1904, Chesbro vs. Waddell. For two deadball aces, you’d think relief work would be a factor; it is, but only to a small extent. Waddell had no relief outings this year; Chesbro had four (allowing 6 runs in 10.2 innings), compared to 51 starts (48 of which he completed). Waddell had 349 strikeouts, a total that nobody would match for over 60 years, but he also allowed more walks, homers, and hit batters than Chesbro in 60 fewer innings as a starter. Waddell’s per-inning rates of runs and hits allowed were also higher. Chesbro’s sets of parks and opponents were a bit tougher, and he recorded about one extra out per start. The extra 100-plus strikeouts (in five fewer starts) are a big deficit to make up; GSDev thinks Chesbro does enough.

1907, Young vs. Mathewson. These two had a bit more relief work – but Young, GSDev’s choice as the superior starter, also pitched better in relief. Mathewson holds a slight lead in raw Game Score, 71.9 to 71.3, but he was in the lower-scoring NL, and avoided facing the second highest-scoring team in the league (because he played for them). Young’s Boston Americans, meanwhile, sported the AL’s feeblest lineup. The environmental adjustment boosts Young’s per-game numbers above Matty’s, and he had an extra start on top of that.

1908, Walsh vs. Mathewson. I promise GSDev doesn’t hate Christy Mathewson; it does give him a couple of #1 finishes that we’ll discuss in the next post. For now, though, we finally get some relevant bullpen work to examine. This time, it very well may make a substantive difference, as Mathewson’s 12 relief appearances added up to 28 innings and only 5 runs allowed, while Walsh’s 17 bullpen efforts comprised 38 innings and 18 runs. Removing his comparatively ineffective relief outings nudges Walsh’s RA as a starter just below Mathewson’s. Matty allowed fewer hits and walks per inning; he gave up 5 homers to Walsh’s 1, but Walsh allowed far more hit batters and wild pitches. And this time, it’s Walsh who had the more pitcher-friendly environment. The difference between them on a per-start basis is razor-thin… but Walsh had five more starts, and thus pulls ahead.

1935, Blanton vs. Grove. Relief is a factor here as well; the pitchers made five relief outings each, but Blanton allowed six runs in his 9.1 bullpen innings to Grove’s three in 12. The rest of the difference is hard to assess, largely because Grove and Blanton pitched in different leagues, and there was a large, persistent difference between the leagues in the ‘30s, with the AL being notably higher-scoring every year. GSDev adjusts for this, of course; Grove’s average park-opponent combo was 0.41 runs per game higher than Blanton’s. That adjustment makes up nearly 60% of the difference in their raw per-game numbers, but that still leaves Blanton ahead. It may genuinely be the relief outings making up the whole difference here.

1937, Gomez vs. Grove. Most of these matchups are pretty close, as you might expect from seasons in which GSDev disagrees with a WAR consensus. This one? GSDev has this as a blowout. So what’s up? Well, first things first; neither pitcher made a relief appearance all year. Gomez had the better ERA (2.33 to 3.02) and FIP (3.29 to 3.44), leading the league in both categories. He also led the league in hits per 9, strikeouts, strikeouts per 9, strikeout to walk ratio, and shutouts. He had two more regular season starts than Grove, and their average innings per start were nearly identical. Grove was in tougher parks, and Gomez ducked the supremely excellent Yankee lineup; accounting for this only narrows his Game Score advantage from 5.9 to 3.5. And that’s before mentioning Gomez’s two complete game wins in the World Series. Sometimes it’s a challenge to figure out what GSDev is thinking; this is not one of those times.

1946, Newhouser vs. Feller. This is a close race in all three systems. It’s also relatively easy to explain. Newhouser was a better pitcher on a per-game basis; he led the AL in both ERA and FIP despite working in a hitters’ park while Feller benefited from the best pitchers’ park in the game. Feller made 8 more starts, totaling 70 additional innings. Both pitchers fare spectacularly well in all three systems, but GSDev narrowly prefers Newhouser’s per-game excellence, while WAR leans toward Feller’s higher volume.

1964, Drysdale vs. Chance. In the season that inverts the last two digits of 1946, the race’s outcome is also inverted. This time, it’s Chance who was better on a per-game basis, and Drysdale who had the extra starts. GSDev picking Drysdale honestly surprises me. Chance’s ERA and FIP were both better, despite being in a similar park in the higher-scoring league. Chance had 11 relief outings, but pitched better as a starter than as a reliever. The other little things that might result in a swing (unearned runs, wild pitches, hit batters) are all in Chance’s favor as well. The only thing keeping Drysdale’s per-start numbers close are a higher inning count (8 innings per start to 7.3) and a lower walk rate (1.7 per start to 2.2). Ultimately, Chance leads Drysdale in average adjusted Game Score, 64.9 to 64.1, but that margin is close enough for Drysdale’s 40-35 lead in starts to make the difference. (As a bonus note, Sandy Koufax blew past both of them in average Game Score, with a 67.3, but in only 28 starts.)

2011, Verlander vs. Halladay. If this was regular-season only, Verlander would win easily; his average adjusted Game Score was higher and he had two additional starts. This lead was likely because of something that has been mentioned a couple of times so far without really being highlighted: hits allowed. GSDev is the only one of these systems that considers hits allowed as a factor. Even including the postseason (in which Halladay pitched well and Verlander struggled somewhat, although he did extend his advantage in starts), Verlander allowed 6.4 hits per 9 innings to Halladay’s 7.8. That was enough to keep pace with Halladay’s superior fielding-independent numbers, which in turn allowed the extra starts to make the difference.

And that’s all ten of GSDev’s dissents explored. Next time, we’ll go through the remaining category: the 21 seasons in which GSDev, bWAR, and fWAR have all agreed to disagree on the identity of baseball’s best pitcher.

SABER All I Want To

Monday, December 22, 2025

Starting Pitcher Ratings: Starting Pitcher of the Year

No comments:

Post a Comment