Monday, December 29, 2025

Starting Pitcher Rankings: Pitcher of the Year Battles Royale

Last time, we moved on from discussing the best pitching careers and began the process of using the GSDev system to evaluate the best starting pitcher of each season from 1901-2022, comparing the results to the readily available bWAR and fWAR systems. Specifically, we examined the cases in which at least two of the three methods agreed on the #1 ranking. This time, we’ll be going the other direction, looking at the years in which the three measurements agree on absolutely nothing.

We’ll take these 21 seasons in groups; fortunately for me, they happen to have arrived in convenient chronological clumps. We’ll start with the deadball years. Pitcher names are followed by their ranking in the other two measures (in the same order as the columns, GSDev then bWAR then fWAR):

Year

GSDev Choice

bWAR Choice

fWAR Choice

1903

Christy Mathewson (2, 2)

Joe McGinnity (2, 6)

Rube Waddell (3, 5)

1905

Christy Mathewson (2, 2)

Irv Young (5, 5)

Cy Young (3, 5)

1906

Mordecai Brown (3, 5)

Vic Willis (10, 7)

Al Orth (5, 2)

1910

Ed Walsh (2, 2)

Russ Ford (3, 4)

Walter Johnson (2, 3)

Three of the four GSDev choices finished second in both of the other stats, which is a pretty impressive result. The other year (1906) is worth looking at a bit more. Mordecai Brown actually led the NL in both ERA and FIP in 1906, but gets dinged in bWAR for playing in front of one of the greatest defensive teams ever assembled, and in both WAR systems for a comparatively low innings total of 277.1 (the WAR selections, Al Orth and Vic Willis, both comfortably cleared 300). Brown partly narrowed the innings gap in the World Series, but neither WAR system cares about that (and he finishes #1 in GSDev without the Series starts anyway).

I am rather surprised to see Willis’s 1906 grade out so low in GSDev, but he comes out at the bottom of a pretty tight grouping. His score of 9.09 ranked #10; #7 posted a 9.20, and #4 had a 10.04. 1906 was frankly a strange year overall, with the two best pitchers in baseball around this time (Cy Young and Christy Mathewson) both having unusually lousy seasons. Our three metrics may not agree on who the best pitcher was, but they are unanimous in their agreement that the MLB leader in 1906 produced the lowest league-leading score of the first 20 years of the AL-NL era.

After four disagreements in the first ten years of the sample, the three systems produced at least some level of harmony in each of the next sixteen seasons. That brings us to our next group of three-way arguments, which took place in the still-segregated portion of the live-ball era:

Year

GSDev Choice

bWAR Choice

fWAR Choice

1927

Dazzy Vance (2, 3)

Tommy Thomas (3, 28)

Willie Foster (NA, 3)

1929

Firpo Marberry (3, 10)

Willis Hudlin (7, 6)

Chet Brewer (NA, 6)

1936

Carl Hubbell (2, 5)

Lefty Grove (2, 4)

Van Mungo (3, 6)

1938

Red Ruffing (4, 5)

Bill Lee (2, 6)

Lefty Gomez (3, 12)

That is quite a bit more chaotic than the last table. The unfortunate lack of Negro League data for GSDev rears its head again in the cases of Willie Foster and Chet Brewer, though if you’re curious, fWAR ranked Lefty Grove second in both 1927 and 1929, so there’s no hope for consensus in those years regardless of Negro League inclusion.

GSDev’s most contentious choice in this period is Firpo Marberry in 1929. Marberry is actually best known as the first bullpen ace in baseball history, so it should surprise nobody that his ’29 season included 23 relief appearances along with his 26 starts. Those additional outings would usually boost his WAR in comparison to his GSDev scores. In this case, however, he pitched much better as a starter, with a 2.86 ERA as compared to a 4.40 mark in his bullpen efforts (and corresponding differences in his hit and walk rates). Even if you penalize Marberry for the shaky extra outings, the systems won’t agree; GSDev #2 Red Lucas didn’t pitch in relief at all, so his score is fixed.

Two other #1 choices had double-digit rankings in an alternate system in this stretch. Lefty Gomez in 1938 gets a bit of an asterisk, as fWAR technically had him tied for first with Paul Derringer (who ranked #4 and #3, respectively); the real story here is that fWAR just wasn’t impressed with anyone at all in ’38, as Gomez and Derringer’s MLB-leading figures were a mere 5.0. And that leaves Tommy Thomas’s 1927, with bWAR evaluating him as baseball’s best pitcher while fWAR thinks he wasn’t even an ace. If you compare Thomas to Lefty Grove, fWAR sees Grove posting an FIP a full run lower in a tougher park, which is enough to easily overcome a bit of an innings deficit. However, the #28 finish is still a surprise for a high-volume, fairly effective pitcher. Looking at Thomas’s Fangraphs player page, it appears there are several seasons in which his FIP numbers have significant discrepancies compared to the ones displayed on Baseball Reference (while some of the other seasons have exact matches). I’m not sure what the source of that issue is, but given that bWAR and GSDev are similarly optimistic on Thomas’s efforts, I suspect there’s something funky going on with fWAR here.

Our last major gap between disputed years lasted from 1910 to 1927. This one is even longer; the three systems managed to partially pacify themselves until expansion was on the horizon:

Year

GSDev Choice

bWAR Choice

fWAR Choice

1960

Don Drysdale (2, 3)

Ernie Broglio (4, 12)

Bob Friend (3, 6)

1961

Whitey Ford (24, 4)

Don Cardwell (17, 12)

Jim Bunning (5, 22)

1962

Bob Gibson (6, 4)

Hank Aguirre (4, 9)

Camilo Pascual (5, 11)

After 20-plus years of peace, the early ‘60s provide three consecutive seasons of complete disharmony. This is particularly highlighted by 1961, a year which escalates the petty squabbles of other seasons to near-blood feud levels.

As has been the case in several other disputed years, 1961 was a campaign that produced few standout starters. Whitey Ford’s GSDev score was the fourth-lowest #1 score in the 122 years of data we have (with one of the three lower winning scores coming in 1929, another year we’ve already seen in this group). And even that relatively unimpressive chart-topper was driven in part by 14 scoreless innings over two World Series starts, without which he’d have finished in third and Camilo Pascual (#5 in bWAR, #6 in fWAR) would have claimed the top spot.

But that doesn’t explain bWAR’s intransigence. GSDev and fWAR are comparatively cordial in 1961, both placing the other’s selection reasonably high on the list. Contrariwise, bWAR boots both Ford and Jim Bunning out of its top 20, and opts instead for Don Cardwell, who places outside the top 10 in each of the other systems. As you might expect, this comes entirely down to the fielding adjustment. Cardwell’s Cubs apparently cost him nearly half a run per 9 innings, while Ford’s Yankees were better than average by a similar margin and Bunning’s Tigers were nearly as good. And yet, Cardwell’s BABIP allowed was .271, lower than the MLB average, and only six points higher than Ford’s .265 (Bunning’s was .259). FIP tells a similar tale; Cardwell’s 3.61 was an improvement on his ERA (3.82), but still worse than Bunning’s 3.23 or Ford’s 3.14 (or Pascual’s 3.39, or Sandy Koufax’s MLB-leading 3.00). So yeah, I share in the skepticism of bWAR’s choice here; 1961 was close enough to be anyone’s year, but “anyone” probably still shouldn’t be someone who was merely #10 in the NL in FIP, and outside the top 10 in ERA, WHIP, and basically every other rate category.

As a side note, Bunning’s #1 ranking in fWAR in 1961 was actually a tie with Koufax, who was the only pitcher to rank in the top 5 in all three systems that year. So the dispute wouldn’t look quite as fervent if Koufax was first in alphabetical order (which, best I can tell, is how both B-R and Fangraphs sort tied pitchers).

1962 was a year in which I expected to see more discord, as bWAR opted for a pitcher who made 22 starts and 20 relief appearances. But those 22 starts helped Aguirre to an MLB-leading ERA and an AL-leading FIP, and he was generally a bit better as a starter than as a reliever, so GSDev is impressed despite accounting for just under 80% of his work, and fWAR’s complaints are not as vociferous as they might have been.

Our next jump forward is short enough that I was tempted to combine the early ‘60s group with the next one, but this table is going to be big enough as it is, and the relatively short gap does cover some notable changes in the league, including the additions of four new teams and a round of the playoffs, plus a reduction in mound height. On we go to the middle of the expansion era:

Year

GSDev Choice

bWAR Choice

fWAR Choice

1971

Tom Seaver (2, 2)

Wilbur Wood (4, 4)

Fergie Jenkins (3, 3)

1974

Gaylord Perry (2, 9)

Jon Matlack (3, 3)

Bert Blyleven (6, 4)

1976

Frank Tanana (3, 4)

Mark Fidrych (4, 12)

Vida Blue (2, 2)

1977

Tom Seaver (4, 5)

Rick Reuschel (10, 4)

Dennis Leonard (3, 13)

1981

Fernando Valenzuela (3, 2)

Bert Blyleven (7, 6)

Steve Carlton (2, 2)

1982

Mario Soto (3, 2)

Steve Rogers (2, 3)

Steve Carlton (3, 7)

1983

Mario Soto (3, 13)

John Denny (7, 3)

Steve Carlton (6, 5)

Wow, does fWAR love Steve Carlton in the early ‘80s. (Everyone else does too, but fWAR loves him the most.) It’s worth observing here that bWAR’s #1 spot in 1982 was a tie between Steve Rogers and Dave Stieb (who was #7 in GSDev and #10 in fWAR).

Overall, this group is much calmer than the last one. The main points of contention are Mark Fidrych in ’76 (who famously posted an MLB-best 2.34 ERA despite striking out less than 4 batters per 9 innings, a very low rate even for the time), Dennis Leonard in ’77 (whose Royals apparently had a good defense despite Leonard allowing 18 unearned runs and having an FIP solidly lower than his ERA), and Mario Soto in ’83 (whose FIP jumped by nearly a run from the year before while his ERA and hit rate both went down).

And from there, we get to another drought, followed by our final grouping:

Year

GSDev Choice

bWAR Choice

fWAR Choice

2003

Jason Schmidt (6, 5)

Roy Halladay (4, 3)

Mark Prior (3, 4)

2007

Josh Beckett (2, 4)

Roy Oswalt (20, 11)

Jake Peavy (3, 7)

2008

Roy Halladay (7, 4)

Tim Lincecum (2, 2)

CC Sabathia (4, 5)

Roy Oswalt was a terrific pitcher; the GSDev career rankings see him as a borderline Hall of Fame candidate. But in 2007, he had the highest WHIP and lowest K rate of the first decade of his career, the highest walk rate of any season until his last, and his lowest K/BB ratio ever. Houston’s defense that year was terrible (according to bWAR), despite having been good in the surrounding seasons, and despite Oswalt’s ERA being much better than his FIP (the team as a whole was almost exactly even in that regard, 4.70 ERA to 4.73 FIP). So as good as Oswalt is generally, I’m not buying this particular year as best-in-baseball material.

2008 is at least a mildly interesting year for one particular reason: it is one of two seasons in which the #1 ranking in GSDev changed hands due to the addition of wild pitches to the Game Score calculation. Roy Halladay and Tim Lincecum had a close race; Halladay threw only 4 wild pitches while Lincecum led the majors with 17. That race was also very nearly decided by the postseason, as #4 Sabathia got blown up in his lone playoff start; he would have finished #2 ahead of Lincecum otherwise (and a good October start would have pushed him to #1).

This is our last group of discordant seasons; since 2008, at least two of the three systems have agreed every year (with one near-exception in 2016 when Clayton Kershaw led in GSDev and managed a tie for first in fWAR). For what it’s worth, this appears very likely to still be the case through 2025, as the provisional leaders in GSDev in ’23, ’24 and ’25 all led in one WAR or the other (though not both; bWAR and fWAR haven’t agreed on a full-length season leader since 2013).

The numbers overall point to a trend similar to what was shown last time: GSDev is more agreeable with each WAR system than they are with each other. Here is a table of median rankings of unanimously disagreed-on champions in alternate systems:

 

Dev Leader

bWAR Leader

fWAR Leader

Dev Rank

 

4

3

bWAR Rank

3

 

5

fWAR Rank

4

6

 

The GSDev-associated rankings in both directions clustered around 3 and 4, while the WAR systems placed each other’s leaders a couple spots lower. Averages are trickier to calculate, given the Negro League seasons involved, but even factoring those in, bWAR and fWAR are still more hostile to each other than they are to GSDev. Those differences also persist regardless of which pitcher you select from the tied seasons mentioned earlier in the post (’38 and ’61 in fWAR, ’82 in bWAR). Read into that what you will, but I’m comfortable interpreting it as GSDev taking fewer outlandish positions in seasonal rankings than its counterparts.

With that, we bring this series of posts to a close, at least for now. I’ll plan to revisit the GSDev rankings when 1898-1900 numbers become available (hopefully later this offseason), and may touch on some other side topics at that time. But for now, we’ve reached a natural stopping point. If time permits, my next post(s) will probably be an update to the positional Weighted WAR rankings covering changes due to the 2025 season.

Monday, December 22, 2025

Starting Pitcher Ratings: Starting Pitcher of the Year

So far in this series, we’ve used the GSDev rating system to examine the best starting pitching seasons and careers (over the course of three parts) since the founding of the American League. Now, we’ll be looking at the best pitchers year by year, and doing so in comparison to the systems whose deficiencies inspired us to take a fresh look in the first place: bWAR and fWAR.

We have 122 seasons of essentially complete GSDev data, stretching from 1901-2022. When considering the best pitcher according to all three systems, we can split those seasons into five categories: all systems disagree, GSDev and bWAR agree, GSDev and fWAR agree, bWAR and fWAR agree, and all systems agree. There’s a bit of a paradox in this analysis, in that the option that produces the most interesting efforts by the individual players is probably the least interesting to look at in comparing the systems. So let’s start with the years in which all three systems select the same best pitcher. There are 46 such seasons:

Year

Pitcher

GSDev

bWAR

fWAR

1901

Cy Young

14.43

12.4

7.8

1902

Cy Young

13.01

10.1

7.7

1911

Ed Walsh

14.37

9.2

7.6

1912

Walter Johnson

17.40

14.3

9.3

1913

Walter Johnson

18.48

15.2

8.5

1918

Walter Johnson

15.95

10.5

6.5

1919

Walter Johnson

15.46

10.8

6.8

1923

Dolf Luque

16.22

10.7

6.7

1924

Dazzy Vance

18.82

10.5

7.7

1928

Dazzy Vance

16.81

10.1

6.9

1930

Lefty Grove

15.98

10.4

8.3

1931

Lefty Grove

18.83

10.4

7.3

1932

Lefty Grove

15.57

9.5

7.0

1934

Dizzy Dean

15.85

8.9

6.5

1939

Bob Feller

16.92

9.2

6.5

1942

Mort Cooper

14.44

8.2

6.5

1943

Spud Chandler

15.37

6.4

6.3

1944

Dizzy Trout

14.74

9.3

7.3

1945

Hal Newhouser

17.17

11.3

8.0

1948

Harry Brecheen

14.05

8.7

7.7

1949

Mel Parnell

13.56

8.0

6.9

1950

Ewell Blackwell

11.91

7.5

6.4

1951

Robin Roberts

12.61

8.0

6.7

1953

Robin Roberts

15.98

9.8

8.4

1954

Robin Roberts

14.56

9.0

7.1

1957

Frank Sullivan

13.37

6.4

6.4

1959

Camilo Pascual

13.78

7.8

7.6

1963

Sandy Koufax

18.12

10.7

9.2

1966

Sandy Koufax

18.49

10.3

9.1

1968

Bob Gibson

20.75

11.2

8.6

1970

Bob Gibson

15.68

8.9

9.8

1972

Steve Carlton

18.49

12.1

11.1

1980

Steve Carlton

21.25

10.2

8.8

1985

Dwight Gooden

19.5

12.2

8.9

1987

Roger Clemens

17.11

9.4

8.4

1989

Bret Saberhagen

16.47

9.7

7.5

1990

Roger Clemens

15.87

10.4

8.2

1994

Greg Maddux

17.35

8.5

7.4

1997

Roger Clemens

20.41

11.9

10.7

1998

Kevin Brown

17.77

8.6

9.6

1999

Pedro Martinez

22.60

9.8

11.6

2001

Randy Johnson

22.14

10.1

10.4

2006

Johan Santana

16.05

7.6

6.7

2009

Zack Greinke

17.04

10.4

8.7

2012

Justin Verlander

16.18

8.1

6.9

2013

Clayton Kershaw

16.96

8.1

7.2

Unsurprisingly, that list contains some pretty terrific seasons. It also includes several years in which the victorious effort was relatively unimpressive, but the competition was even worse. Still, the nine pitchers with multiple unanimous wins are an impressive group: four for Walter Johnson, three for Roger Clemens, Lefty Grove, and Robin Roberts, and two apiece for Bob Gibson, Steve Carlton, Dazzy Vance, Cy Young, and Sandy Koufax. And of course, the greatest seasons ever have considerable representation on the list; among the top 27 GSDev scores to date (anything over 18), we see W. Johnson ’12 and ’13, Vance ’24, Grove ’31, Koufax ’63 and ’66, Gibson ’68, Carlton ’72 and ’80, Gooden ’85, Clemens ’97, Pedro ’99, and R. Johnson ’01.

That list, however, is not complete. Out of the 23 league-leading GSDev totals that exceeded 18, only 14 were unanimously acclaimed as the best pitcher in baseball. Notable omissions include Ron Guidry 1978, Grover Cleveland Alexander 1915, Greg Maddux 1995, Sandy Koufax 1965, and most startling of all, Pedro Martinez 2000 – the highest single-season GSDev score on record.

That brings us to the topic of disagreements. With 46 seasons of unanimity, that leaves 76 years of squabbling (including every season since 2013 – a streak that has continued through 2025, as bWAR and fWAR don’t agree in any of the years for which we don’t yet have final GSDev numbers). How do those seasons break down?

Disagreer

Years

bWAR

25

fWAR

20

GSDev

10

Everyone

21

That’s a relatively even split between bWAR and fWAR, a reasonable chunk of unanimous agreement to disagree… and remarkably few in which GSDev is the lone holdout. Let’s go through these categories one at a time, starting with our most common objector.

Year

fWAR/GSDev Choice

bWAR Rnk

bWAR Choice

1909

Mordecai Brown

2

Christy Mathewson

1915

Grover Alexander

2

Walter Johnson

1916

Walter Johnson

2

Grover Alexander

1920

Stan Coveleski

3

Grover Alexander

1941

Whit Wyatt

3

Thornton Lee

1947

Ewell Blackwell

2

Warren Spahn

1956

Herb Score

2

Early Wynn

1958

Sam Jones

2

Frank Lary

1965

Sandy Koufax

4

Juan Marichal

1978

Ron Guidry

2

Phil Niekro

1979

JR Richard

8

Phil Niekro

1984

Dwight Gooden

4

Dave Stieb

1986

Mike Scott

3

Teddy Higuera

1988

Roger Clemens

5

Mark Gubicza

1991

Roger Clemens

2

Tom Glavine

1996

John Smoltz

4

Pat Hentgen

2002

Curt Schilling

2

Randy Johnson

2004

Randy Johnson

2

Johan Santana

2005

Johan Santana

3

Roger Clemens

2014

Clayton Kershaw

2

Corey Kluber

2015

Clayton Kershaw

3

Zack Greinke

2016

Clayton Kershaw*

3

Justin Verlander

2018

Jacob deGrom

2

Aaron Nola

2019

Gerrit Cole

5

Mike Minor

2021

Corbin Burnes

10

Zack Wheeler

That’s 25 years in which bWAR was the lone holdout (depending on how you look at 2016, in which Kershaw was tied for first in fWAR with Jose Fernandez). In twelve of them, the fWAR/GSDev choice ranked second, which is generally understandable. Throw in another six third-place finishes, which usually aren’t too out there. (Even the near misses are amusing at times, like with the Alexander/Johnson switcheroo in 1915-16, or the Schilling-Johnson-Santana-Clemens ring-around-the-rosie in the early 2000s.)

There are some oddballs, though. We’ve talked about 2019 Gerrit Cole and 2021 Corbin Burnes during the early posts in this series; those were two of the bigger bWAR discrepancies of the bunch. As far as the other fourth-and-below finishes go: Smoltz in ’96 is a fairly standard bWAR/fWAR tiff (FIP lower than ERA despite a good-fielding team), with a side of brilliant postseason work. Despite finishing in fifth place, Clemens in ’88 is a surprisingly close contender, less than a win back of first. Gooden in ’84 had a shocking-for-the-time 11.4 K/9 in his rookie year, thereby producing one of the best FIP seasons ever; his ERA was nearly a run higher.

Richard in ’79 surprises me. Yes, his FIP is lower than his ERA by half a run – but he still led the majors in ERA. Houston was a pitcher’s park, but not a crazy one (Richard’s personal park factor for the year was 93, according to B-R, higher than any he’d had since 1975). bWAR thinks the Astros’ defense was good that year, but Richard’s BABIP allowed was barely different from average (and that difference probably came from the park factor). I can understand not having him in first, but #8 (and nearly 2 wins out of first) is a leap.

And then there’s the big one. Sandy Koufax 1965, the #4 season ever by GSDev, and #7 in the sample according to fWAR, ranks fourth for the year in bWAR. Admittedly, his margin behind Sam McDowell and Jim Maloney is miniscule. But Juan Marichal bests him by 10.3 bWAR to 8.1. Marichal’s 1965 is quite a season in its own right, one of the top 60 of all time by GSDev. Koufax won the ERA title, but Marichal only trailed him by 0.09, and was in a tougher park; he led the majors in ERA+. However, Koufax threw 40 more innings (not even counting the playoffs). You’d think they’d at least be close. The difference? According to bWAR, the Giants had an average defense, while LA’s was phenomenal. This was not typical; for most of Koufax’s best years, the Dodger fielders grade as average or below. From 1961-66, Koufax’s fielding adjustment per 9 innings in bWAR goes: -0.20, -0.07, -0.15, 0.03, 0.30, -0.07. Now, if you look at Koufax’s BABIP in ’65, you might buy it; his .238 mark was 40 points better than league average. Marichal’s BABIP, by contrast, was .238. Which is not actually a contrast, given that it’s the same number.

This is all having the discussion on bWAR’s terms; we’re ignoring Koufax’s 382 strikeouts, 1.93 FIP, and phenomenal World Series performance. Even looking at the things that bWAR looks at, I don’t agree that there are two wins of margin in Marichal’s favor here.

Having dumped on one WAR system, let’s switch to the other! Here are the 20 seasons in which fWAR is the disagreeable option:

Year

bWAR/GSDev Choice

fWAR Rnk

fWAR Choice

1914

Walter Johnson

2

Cy Falkenberg

1917

Eddie Cicotte

3

Grover Alexander

1921

Red Faber

3

Stan Coveleski

1922

Red Faber

2

Urban Shocker

1926

George Uhle

2

Lefty Grove

1933

Carl Hubbell

2

Dizzy Dean

1940

Bob Feller

2

Ray Brown

1952

Bobby Shantz

2

Robin Roberts

1955

Billy Pierce

2

Bob Rush

1967

Jim Bunning

2

Dean Chance

1969

Bob Gibson

2

Sam McDowell

1973

Tom Seaver

4

Bert Blyleven

1975

Jim Palmer

4

Tom Seaver

1992

Greg Maddux

3

Roger Clemens

1993

Kevin Appier

4

Greg Maddux

1995

Greg Maddux

2

Randy Johnson

2000

Pedro Martinez

2

Randy Johnson

2010

Roy Halladay

4

Cliff Lee

2017

Corey Kluber

2

Chris Sale

2022

Sandy Alcantara

4

Aaron Nola

The fWAR table looks a bit more under control than the bWAR table; none of the consensus top finishers from the other two systems land outside the top 4, and 12 of the 20 finish #2. Of those, the obvious one to highlight is Unit over Pedro in 2000, a year in which Martinez had the single highest GSDev score to date. Johnson had more innings, but not by a huge margin (248.2 to 217), and Pedro led the majors in FIP and a number of related categories. Johnson’s year was excellent as well (he led the NL in FIP and ERA+, and the majors in strikeouts and strikeout rate), but Pedro beat him by 36 points in FIP despite being in the DH league. To put it mildly, this outcome is a surprise for me.

The other standout seasons are the years in which the fWAR fourth-place finisher leads in the other two systems. There’s not necessarily going to be much to say here in most cases. Take 1973, Seaver vs. Blyleven. Seaver has a better ERA (and allowed just 7 unearned runs to Blyleven’s 18), and pitched extremely well in the playoffs. Blyleven had a better FIP and pitched 35 more innings in the regular season (Seaver’s postseason makes up most of that difference as well). Seaver vs. Palmer two years later puts Tom Terrific on the other side of the exchange; Palmer had a 2.09 ERA but a 2.96 FIP that year. Appier vs. Maddux? Appier’s RA was lower despite being in the DH league; Maddux’s FIP was lower and he pitched more innings. (Maddux also struggled in the postseason that year, which doesn’t interest fWAR at all.) Halladay/Lee and Alcantara/Nola are more of the same, with FIP/ERA differences in opposite directions. The 2010 Halladay/Lee race was incredibly close in GSDev, as were some of the others listed here (Cicotte/Alexander in 1917, Maddux/Clemens in 1992).

Speaking of close races, we should mention Shantz vs. Roberts in 1952. GSDev has the margin between those two pitchers as 0.01, the narrowest in any season. 1952 was roughly one wild pitch away from being in the bWAR table instead of this one. (As it happens, Shantz threw 0 wild pitches in 1952, and Roberts threw 2; that was enough to swing the results.)

On to our final group for this post: years in which bWAR and fWAR agree on the best pitcher, and GSDev dissents.

Year

bWAR/fWAR Choice

GSDev Rnk

GSDev Choice

1904

Rube Waddell

2

Jack Chesbro

1907

Christy Mathewson

2

Cy Young

1908

Christy Mathewson

2

Ed Walsh

1925

Bullet Rogan

NA

Dazzy Vance

1935

Lefty Grove

2

Cy Blanton

1937

Lefty Grove

2

Lefty Gomez

1946

Bob Feller

2

Hal Newhouser

1964

Dean Chance

2

Don Drysdale

2011

Roy Halladay

2

Justin Verlander

2020

Shane Bieber

2

Trevor Bauer

The obvious standout here is Negro League great Bullet Rogan in 1925, who I don’t have the data to rank via GSDev. Given the agreement between bWAR and fWAR on Rogan’s prowess and the fact that neither Vance nor anyone else in MLB had a particularly exceptional year, I’m reasonably confident that Rogan should indeed be in the #1 spot.

Outside of that, there are nine seasons in which both WAR systems agree on the best pitcher and GSDev does not. In all nine cases, GSDev ranks the WAR consensus in the #2 spot. That is both fewer and smaller discrepancies than have arisen from the other two systems – but that makes sense, because GSDev is to some extent a compromise between the two WARs, accounting for factors used by both of them. So what causes the differences?

The most obvious option is one we’ve brought up frequently throughout the series: the postseason, which GSDev counts and both WARs ignore. This is not as much of a factor in these particular years as might be expected, for a straightforward reason: of the 18 pitchers in question, only 5 reached the postseason. Those were Gomez in 1937, Verlander and Halladay in 2011, and Bauer and Bieber in 2020. Working in reverse order: 2020 was definitely decided by the postseason; Bieber had his worst start of the year in his only playoff outing, while Bauer’s lone playoff start was his best of the shortened campaign. 2011 has the opposite effect; Halladay’s two excellent postseason starts brought him up quite a bit, while Verlander’s four mediocre outings didn’t move his score at all. Meanwhile, in ’37, Gomez does lose two very good World Series starts if you ignore the playoffs… and his margin over Grove is so big that he easily wins anyway.

So, we’ve accounted for one season out of nine. Let’s run through the others chronologically.

1904, Chesbro vs. Waddell. For two deadball aces, you’d think relief work would be a factor; it is, but only to a small extent. Waddell had no relief outings this year; Chesbro had four (allowing 6 runs in 10.2 innings), compared to 51 starts (48 of which he completed). Waddell had 349 strikeouts, a total that nobody would match for over 60 years, but he also allowed more walks, homers, and hit batters than Chesbro in 60 fewer innings as a starter. Waddell’s per-inning rates of runs and hits allowed were also higher. Chesbro’s sets of parks and opponents were a bit tougher, and he recorded about one extra out per start. The extra 100-plus strikeouts (in five fewer starts) are a big deficit to make up; GSDev thinks Chesbro does enough.

1907, Young vs. Mathewson. These two had a bit more relief work – but Young, GSDev’s choice as the superior starter, also pitched better in relief. Mathewson holds a slight lead in raw Game Score, 71.9 to 71.3, but he was in the lower-scoring NL, and avoided facing the second highest-scoring team in the league (because he played for them). Young’s Boston Americans, meanwhile, sported the AL’s feeblest lineup. The environmental adjustment boosts Young’s per-game numbers above Matty’s, and he had an extra start on top of that.

1908, Walsh vs. Mathewson. I promise GSDev doesn’t hate Christy Mathewson; it does give him a couple of #1 finishes that we’ll discuss in the next post. For now, though, we finally get some relevant bullpen work to examine. This time, it very well may make a substantive difference, as Mathewson’s 12 relief appearances added up to 28 innings and only 5 runs allowed, while Walsh’s 17 bullpen efforts comprised 38 innings and 18 runs. Removing his comparatively ineffective relief outings nudges Walsh’s RA as a starter just below Mathewson’s. Matty allowed fewer hits and walks per inning; he gave up 5 homers to Walsh’s 1, but Walsh allowed far more hit batters and wild pitches. And this time, it’s Walsh who had the more pitcher-friendly environment. The difference between them on a per-start basis is razor-thin… but Walsh had five more starts, and thus pulls ahead.

1935, Blanton vs. Grove. Relief is a factor here as well; the pitchers made five relief outings each, but Blanton allowed six runs in his 9.1 bullpen innings to Grove’s three in 12. The rest of the difference is hard to assess, largely because Grove and Blanton pitched in different leagues, and there was a large, persistent difference between the leagues in the ‘30s, with the AL being notably higher-scoring every year. GSDev adjusts for this, of course; Grove’s average park-opponent combo was 0.41 runs per game higher than Blanton’s. That adjustment makes up nearly 60% of the difference in their raw per-game numbers, but that still leaves Blanton ahead. It may genuinely be the relief outings making up the whole difference here.

1937, Gomez vs. Grove. Most of these matchups are pretty close, as you might expect from seasons in which GSDev disagrees with a WAR consensus. This one? GSDev has this as a blowout. So what’s up? Well, first things first; neither pitcher made a relief appearance all year. Gomez had the better ERA (2.33 to 3.02) and FIP (3.29 to 3.44), leading the league in both categories. He also led the league in hits per 9, strikeouts, strikeouts per 9, strikeout to walk ratio, and shutouts. He had two more regular season starts than Grove, and their average innings per start were nearly identical. Grove was in tougher parks, and Gomez ducked the supremely excellent Yankee lineup; accounting for this only narrows his Game Score advantage from 5.9 to 3.5. And that’s before mentioning Gomez’s two complete game wins in the World Series. Sometimes it’s a challenge to figure out what GSDev is thinking; this is not one of those times.

1946, Newhouser vs. Feller. This is a close race in all three systems. It’s also relatively easy to explain. Newhouser was a better pitcher on a per-game basis; he led the AL in both ERA and FIP despite working in a hitters’ park while Feller benefited from the best pitchers’ park in the game. Feller made 8 more starts, totaling 70 additional innings. Both pitchers fare spectacularly well in all three systems, but GSDev narrowly prefers Newhouser’s per-game excellence, while WAR leans toward Feller’s higher volume.

1964, Drysdale vs. Chance. In the season that inverts the last two digits of 1946, the race’s outcome is also inverted. This time, it’s Chance who was better on a per-game basis, and Drysdale who had the extra starts. GSDev picking Drysdale honestly surprises me. Chance’s ERA and FIP were both better, despite being in a similar park in the higher-scoring league. Chance had 11 relief outings, but pitched better as a starter than as a reliever. The other little things that might result in a swing (unearned runs, wild pitches, hit batters) are all in Chance’s favor as well. The only thing keeping Drysdale’s per-start numbers close are a higher inning count (8 innings per start to 7.3) and a lower walk rate (1.7 per start to 2.2). Ultimately, Chance leads Drysdale in average adjusted Game Score, 64.9 to 64.1, but that margin is close enough for Drysdale’s 40-35 lead in starts to make the difference. (As a bonus note, Sandy Koufax blew past both of them in average Game Score, with a 67.3, but in only 28 starts.)

2011, Verlander vs. Halladay. If this was regular-season only, Verlander would win easily; his average adjusted Game Score was higher and he had two additional starts. This lead was likely because of something that has been mentioned a couple of times so far without really being highlighted: hits allowed. GSDev is the only one of these systems that considers hits allowed as a factor. Even including the postseason (in which Halladay pitched well and Verlander struggled somewhat, although he did extend his advantage in starts), Verlander allowed 6.4 hits per 9 innings to Halladay’s 7.8. That was enough to keep pace with Halladay’s superior fielding-independent numbers, which in turn allowed the extra starts to make the difference.

And that’s all ten of GSDev’s dissents explored. Next time, we’ll go through the remaining category: the 21 seasons in which GSDev, bWAR, and fWAR have all agreed to disagree on the identity of baseball’s best pitcher.