SABER All I Want To: Starting Pitcher Rankings: Pitcher of the Year Battles Royale

Last time, we moved on from discussing the best pitching careers and began the process of using the GSDev system to evaluate the best starting pitcher of each season from 1901-2022, comparing the results to the readily available bWAR and fWAR systems. Specifically, we examined the cases in which at least two of the three methods agreed on the #1 ranking. This time, we’ll be going the other direction, looking at the years in which the three measurements agree on absolutely nothing.

We’ll take these 21 seasons in groups; fortunately for me, they happen to have arrived in convenient chronological clumps. We’ll start with the deadball years. Pitcher names are followed by their ranking in the other two measures (in the same order as the columns, GSDev then bWAR then fWAR):

Year	GSDev Choice	bWAR Choice	fWAR Choice
1903	Christy Mathewson (2, 2)	Joe McGinnity (2, 6)	Rube Waddell (3, 5)
1905	Christy Mathewson (2, 2)	Irv Young (5, 5)	Cy Young (3, 5)
1906	Mordecai Brown (3, 5)	Vic Willis (10, 7)	Al Orth (5, 2)
1910	Ed Walsh (2, 2)	Russ Ford (3, 4)	Walter Johnson (2, 3)

Three of the four GSDev choices finished second in both of the other stats, which is a pretty impressive result. The other year (1906) is worth looking at a bit more. Mordecai Brown actually led the NL in both ERA and FIP in 1906, but gets dinged in bWAR for playing in front of one of the greatest defensive teams ever assembled, and in both WAR systems for a comparatively low innings total of 277.1 (the WAR selections, Al Orth and Vic Willis, both comfortably cleared 300). Brown partly narrowed the innings gap in the World Series, but neither WAR system cares about that (and he finishes #1 in GSDev without the Series starts anyway).

I am rather surprised to see Willis’s 1906 grade out so low in GSDev, but he comes out at the bottom of a pretty tight grouping. His score of 9.09 ranked #10; #7 posted a 9.20, and #4 had a 10.04. 1906 was frankly a strange year overall, with the two best pitchers in baseball around this time (Cy Young and Christy Mathewson) both having unusually lousy seasons. Our three metrics may not agree on who the best pitcher was, but they are unanimous in their agreement that the MLB leader in 1906 produced the lowest league-leading score of the first 20 years of the AL-NL era.

After four disagreements in the first ten years of the sample, the three systems produced at least some level of harmony in each of the next sixteen seasons. That brings us to our next group of three-way arguments, which took place in the still-segregated portion of the live-ball era:

Year	GSDev Choice	bWAR Choice	fWAR Choice
1927	Dazzy Vance (2, 3)	Tommy Thomas (3, 28)	Willie Foster (NA, 3)
1929	Firpo Marberry (3, 10)	Willis Hudlin (7, 6)	Chet Brewer (NA, 6)
1936	Carl Hubbell (2, 5)	Lefty Grove (2, 4)	Van Mungo (3, 6)
1938	Red Ruffing (4, 5)	Bill Lee (2, 6)	Lefty Gomez (3, 12)

That is quite a bit more chaotic than the last table. The unfortunate lack of Negro League data for GSDev rears its head again in the cases of Willie Foster and Chet Brewer, though if you’re curious, fWAR ranked Lefty Grove second in both 1927 and 1929, so there’s no hope for consensus in those years regardless of Negro League inclusion.

GSDev’s most contentious choice in this period is Firpo Marberry in 1929. Marberry is actually best known as the first bullpen ace in baseball history, so it should surprise nobody that his ’29 season included 23 relief appearances along with his 26 starts. Those additional outings would usually boost his WAR in comparison to his GSDev scores. In this case, however, he pitched much better as a starter, with a 2.86 ERA as compared to a 4.40 mark in his bullpen efforts (and corresponding differences in his hit and walk rates). Even if you penalize Marberry for the shaky extra outings, the systems won’t agree; GSDev #2 Red Lucas didn’t pitch in relief at all, so his score is fixed.

Two other #1 choices had double-digit rankings in an alternate system in this stretch. Lefty Gomez in 1938 gets a bit of an asterisk, as fWAR technically had him tied for first with Paul Derringer (who ranked #4 and #3, respectively); the real story here is that fWAR just wasn’t impressed with anyone at all in ’38, as Gomez and Derringer’s MLB-leading figures were a mere 5.0. And that leaves Tommy Thomas’s 1927, with bWAR evaluating him as baseball’s best pitcher while fWAR thinks he wasn’t even an ace. If you compare Thomas to Lefty Grove, fWAR sees Grove posting an FIP a full run lower in a tougher park, which is enough to easily overcome a bit of an innings deficit. However, the #28 finish is still a surprise for a high-volume, fairly effective pitcher. Looking at Thomas’s Fangraphs player page, it appears there are several seasons in which his FIP numbers have significant discrepancies compared to the ones displayed on Baseball Reference (while some of the other seasons have exact matches). I’m not sure what the source of that issue is, but given that bWAR and GSDev are similarly optimistic on Thomas’s efforts, I suspect there’s something funky going on with fWAR here.

Our last major gap between disputed years lasted from 1910 to 1927. This one is even longer; the three systems managed to partially pacify themselves until expansion was on the horizon:

Year	GSDev Choice	bWAR Choice	fWAR Choice
1960	Don Drysdale (2, 3)	Ernie Broglio (4, 12)	Bob Friend (3, 6)
1961	Whitey Ford (24, 4)	Don Cardwell (17, 12)	Jim Bunning (5, 22)
1962	Bob Gibson (6, 4)	Hank Aguirre (4, 9)	Camilo Pascual (5, 11)

After 20-plus years of peace, the early ‘60s provide three consecutive seasons of complete disharmony. This is particularly highlighted by 1961, a year which escalates the petty squabbles of other seasons to near-blood feud levels.

As has been the case in several other disputed years, 1961 was a campaign that produced few standout starters. Whitey Ford’s GSDev score was the fourth-lowest #1 score in the 122 years of data we have (with one of the three lower winning scores coming in 1929, another year we’ve already seen in this group). And even that relatively unimpressive chart-topper was driven in part by 14 scoreless innings over two World Series starts, without which he’d have finished in third and Camilo Pascual (#5 in bWAR, #6 in fWAR) would have claimed the top spot.

But that doesn’t explain bWAR’s intransigence. GSDev and fWAR are comparatively cordial in 1961, both placing the other’s selection reasonably high on the list. Contrariwise, bWAR boots both Ford and Jim Bunning out of its top 20, and opts instead for Don Cardwell, who places outside the top 10 in each of the other systems. As you might expect, this comes entirely down to the fielding adjustment. Cardwell’s Cubs apparently cost him nearly half a run per 9 innings, while Ford’s Yankees were better than average by a similar margin and Bunning’s Tigers were nearly as good. And yet, Cardwell’s BABIP allowed was .271, lower than the MLB average, and only six points higher than Ford’s .265 (Bunning’s was .259). FIP tells a similar tale; Cardwell’s 3.61 was an improvement on his ERA (3.82), but still worse than Bunning’s 3.23 or Ford’s 3.14 (or Pascual’s 3.39, or Sandy Koufax’s MLB-leading 3.00). So yeah, I share in the skepticism of bWAR’s choice here; 1961 was close enough to be anyone’s year, but “anyone” probably still shouldn’t be someone who was merely #10 in the NL in FIP, and outside the top 10 in ERA, WHIP, and basically every other rate category.

As a side note, Bunning’s #1 ranking in fWAR in 1961 was actually a tie with Koufax, who was the only pitcher to rank in the top 5 in all three systems that year. So the dispute wouldn’t look quite as fervent if Koufax was first in alphabetical order (which, best I can tell, is how both B-R and Fangraphs sort tied pitchers).

1962 was a year in which I expected to see more discord, as bWAR opted for a pitcher who made 22 starts and 20 relief appearances. But those 22 starts helped Aguirre to an MLB-leading ERA and an AL-leading FIP, and he was generally a bit better as a starter than as a reliever, so GSDev is impressed despite accounting for just under 80% of his work, and fWAR’s complaints are not as vociferous as they might have been.

Our next jump forward is short enough that I was tempted to combine the early ‘60s group with the next one, but this table is going to be big enough as it is, and the relatively short gap does cover some notable changes in the league, including the additions of four new teams and a round of the playoffs, plus a reduction in mound height. On we go to the middle of the expansion era:

Year	GSDev Choice	bWAR Choice	fWAR Choice
1971	Tom Seaver (2, 2)	Wilbur Wood (4, 4)	Fergie Jenkins (3, 3)
1974	Gaylord Perry (2, 9)	Jon Matlack (3, 3)	Bert Blyleven (6, 4)
1976	Frank Tanana (3, 4)	Mark Fidrych (4, 12)	Vida Blue (2, 2)
1977	Tom Seaver (4, 5)	Rick Reuschel (10, 4)	Dennis Leonard (3, 13)
1981	Fernando Valenzuela (3, 2)	Bert Blyleven (7, 6)	Steve Carlton (2, 2)
1982	Mario Soto (3, 2)	Steve Rogers (2, 3)	Steve Carlton (3, 7)
1983	Mario Soto (3, 13)	John Denny (7, 3)	Steve Carlton (6, 5)

Wow, does fWAR love Steve Carlton in the early ‘80s. (Everyone else does too, but fWAR loves him the most.) It’s worth observing here that bWAR’s #1 spot in 1982 was a tie between Steve Rogers and Dave Stieb (who was #7 in GSDev and #10 in fWAR).

Overall, this group is much calmer than the last one. The main points of contention are Mark Fidrych in ’76 (who famously posted an MLB-best 2.34 ERA despite striking out less than 4 batters per 9 innings, a very low rate even for the time), Dennis Leonard in ’77 (whose Royals apparently had a good defense despite Leonard allowing 18 unearned runs and having an FIP solidly lower than his ERA), and Mario Soto in ’83 (whose FIP jumped by nearly a run from the year before while his ERA and hit rate both went down).

And from there, we get to another drought, followed by our final grouping:

Year	GSDev Choice	bWAR Choice	fWAR Choice
2003	Jason Schmidt (6, 5)	Roy Halladay (4, 3)	Mark Prior (3, 4)
2007	Josh Beckett (2, 4)	Roy Oswalt (20, 11)	Jake Peavy (3, 7)
2008	Roy Halladay (7, 4)	Tim Lincecum (2, 2)	CC Sabathia (4, 5)

Roy Oswalt was a terrific pitcher; the GSDev career rankings see him as a borderline Hall of Fame candidate. But in 2007, he had the highest WHIP and lowest K rate of the first decade of his career, the highest walk rate of any season until his last, and his lowest K/BB ratio ever. Houston’s defense that year was terrible (according to bWAR), despite having been good in the surrounding seasons, and despite Oswalt’s ERA being much better than his FIP (the team as a whole was almost exactly even in that regard, 4.70 ERA to 4.73 FIP). So as good as Oswalt is generally, I’m not buying this particular year as best-in-baseball material.

2008 is at least a mildly interesting year for one particular reason: it is one of two seasons in which the #1 ranking in GSDev changed hands due to the addition of wild pitches to the Game Score calculation. Roy Halladay and Tim Lincecum had a close race; Halladay threw only 4 wild pitches while Lincecum led the majors with 17. That race was also very nearly decided by the postseason, as #4 Sabathia got blown up in his lone playoff start; he would have finished #2 ahead of Lincecum otherwise (and a good October start would have pushed him to #1).

This is our last group of discordant seasons; since 2008, at least two of the three systems have agreed every year (with one near-exception in 2016 when Clayton Kershaw led in GSDev and managed a tie for first in fWAR). For what it’s worth, this appears very likely to still be the case through 2025, as the provisional leaders in GSDev in ’23, ’24 and ’25 all led in one WAR or the other (though not both; bWAR and fWAR haven’t agreed on a full-length season leader since 2013).

The numbers overall point to a trend similar to what was shown last time: GSDev is more agreeable with each WAR system than they are with each other. Here is a table of median rankings of unanimously disagreed-on champions in alternate systems:

	Dev Leader	bWAR Leader	fWAR Leader
Dev Rank		4	3
bWAR Rank	3		5
fWAR Rank	4	6

The GSDev-associated rankings in both directions clustered around 3 and 4, while the WAR systems placed each other’s leaders a couple spots lower. Averages are trickier to calculate, given the Negro League seasons involved, but even factoring those in, bWAR and fWAR are still more hostile to each other than they are to GSDev. Those differences also persist regardless of which pitcher you select from the tied seasons mentioned earlier in the post (’38 and ’61 in fWAR, ’82 in bWAR). Read into that what you will, but I’m comfortable interpreting it as GSDev taking fewer outlandish positions in seasonal rankings than its counterparts.

With that, we bring this series of posts to a close, at least for now. I’ll plan to revisit the GSDev rankings when 1898-1900 numbers become available (hopefully later this offseason), and may touch on some other side topics at that time. But for now, we’ve reached a natural stopping point. If time permits, my next post(s) will probably be an update to the positional Weighted WAR rankings covering changes due to the 2025 season.

SABER All I Want To

Monday, December 29, 2025

Starting Pitcher Rankings: Pitcher of the Year Battles Royale

No comments:

Post a Comment