Monday, December 22, 2025

Starting Pitcher Ratings: Starting Pitcher of the Year

So far in this series, we’ve used the GSDev rating system to examine the best starting pitching seasons and careers (over the course of three parts) since the founding of the American League. Now, we’ll be looking at the best pitchers year by year, and doing so in comparison to the systems whose deficiencies inspired us to take a fresh look in the first place: bWAR and fWAR.

We have 122 seasons of essentially complete GSDev data, stretching from 1901-2022. When considering the best pitcher according to all three systems, we can split those seasons into five categories: all systems disagree, GSDev and bWAR agree, GSDev and fWAR agree, bWAR and fWAR agree, and all systems agree. There’s a bit of a paradox in this analysis, in that the option that produces the most interesting efforts by the individual players is probably the least interesting to look at in comparing the systems. So let’s start with the years in which all three systems select the same best pitcher. There are 46 such seasons:

Year

Pitcher

GSDev

bWAR

fWAR

1901

Cy Young

14.43

12.4

7.8

1902

Cy Young

13.01

10.1

7.7

1911

Ed Walsh

14.37

9.2

7.6

1912

Walter Johnson

17.40

14.3

9.3

1913

Walter Johnson

18.48

15.2

8.5

1918

Walter Johnson

15.95

10.5

6.5

1919

Walter Johnson

15.46

10.8

6.8

1923

Dolf Luque

16.22

10.7

6.7

1924

Dazzy Vance

18.82

10.5

7.7

1928

Dazzy Vance

16.81

10.1

6.9

1930

Lefty Grove

15.98

10.4

8.3

1931

Lefty Grove

18.83

10.4

7.3

1932

Lefty Grove

15.57

9.5

7.0

1934

Dizzy Dean

15.85

8.9

6.5

1939

Bob Feller

16.92

9.2

6.5

1942

Mort Cooper

14.44

8.2

6.5

1943

Spud Chandler

15.37

6.4

6.3

1944

Dizzy Trout

14.74

9.3

7.3

1945

Hal Newhouser

17.17

11.3

8.0

1948

Harry Brecheen

14.05

8.7

7.7

1949

Mel Parnell

13.56

8.0

6.9

1950

Ewell Blackwell

11.91

7.5

6.4

1951

Robin Roberts

12.61

8.0

6.7

1953

Robin Roberts

15.98

9.8

8.4

1954

Robin Roberts

14.56

9.0

7.1

1957

Frank Sullivan

13.37

6.4

6.4

1959

Camilo Pascual

13.78

7.8

7.6

1963

Sandy Koufax

18.12

10.7

9.2

1966

Sandy Koufax

18.49

10.3

9.1

1968

Bob Gibson

20.75

11.2

8.6

1970

Bob Gibson

15.68

8.9

9.8

1972

Steve Carlton

18.49

12.1

11.1

1980

Steve Carlton

21.25

10.2

8.8

1985

Dwight Gooden

19.5

12.2

8.9

1987

Roger Clemens

17.11

9.4

8.4

1989

Bret Saberhagen

16.47

9.7

7.5

1990

Roger Clemens

15.87

10.4

8.2

1994

Greg Maddux

17.35

8.5

7.4

1997

Roger Clemens

20.41

11.9

10.7

1998

Kevin Brown

17.77

8.6

9.6

1999

Pedro Martinez

22.60

9.8

11.6

2001

Randy Johnson

22.14

10.1

10.4

2006

Johan Santana

16.05

7.6

6.7

2009

Zack Greinke

17.04

10.4

8.7

2012

Justin Verlander

16.18

8.1

6.9

2013

Clayton Kershaw

16.96

8.1

7.2

Unsurprisingly, that list contains some pretty terrific seasons. It also includes several years in which the victorious effort was relatively unimpressive, but the competition was even worse. Still, the nine pitchers with multiple unanimous wins are an impressive group: four for Walter Johnson, three for Roger Clemens, Lefty Grove, and Robin Roberts, and two apiece for Bob Gibson, Steve Carlton, Dazzy Vance, Cy Young, and Sandy Koufax. And of course, the greatest seasons ever have considerable representation on the list; among the top 27 GSDev scores to date (anything over 18), we see W. Johnson ’12 and ’13, Vance ’24, Grove ’31, Koufax ’63 and ’66, Gibson ’68, Carlton ’72 and ’80, Gooden ’85, Clemens ’97, Pedro ’99, and R. Johnson ’01.

That list, however, is not complete. Out of the 23 league-leading GSDev totals that exceeded 18, only 14 were unanimously acclaimed as the best pitcher in baseball. Notable omissions include Ron Guidry 1978, Grover Cleveland Alexander 1915, Greg Maddux 1995, Sandy Koufax 1965, and most startling of all, Pedro Martinez 2000 – the highest single-season GSDev score on record.

That brings us to the topic of disagreements. With 46 seasons of unanimity, that leaves 76 years of squabbling (including every season since 2013 – a streak that has continued through 2025, as bWAR and fWAR don’t agree in any of the years for which we don’t yet have final GSDev numbers). How do those seasons break down?

Disagreer

Years

bWAR

25

fWAR

20

GSDev

10

Everyone

21

That’s a relatively even split between bWAR and fWAR, a reasonable chunk of unanimous agreement to disagree… and remarkably few in which GSDev is the lone holdout. Let’s go through these categories one at a time, starting with our most common objector.

Year

fWAR/GSDev Choice

bWAR Rnk

bWAR Choice

1909

Mordecai Brown

2

Christy Mathewson

1915

Grover Alexander

2

Walter Johnson

1916

Walter Johnson

2

Grover Alexander

1920

Stan Coveleski

3

Grover Alexander

1941

Whit Wyatt

3

Thornton Lee

1947

Ewell Blackwell

2

Warren Spahn

1956

Herb Score

2

Early Wynn

1958

Sam Jones

2

Frank Lary

1965

Sandy Koufax

4

Juan Marichal

1978

Ron Guidry

2

Phil Niekro

1979

JR Richard

8

Phil Niekro

1984

Dwight Gooden

4

Dave Stieb

1986

Mike Scott

3

Teddy Higuera

1988

Roger Clemens

5

Mark Gubicza

1991

Roger Clemens

2

Tom Glavine

1996

John Smoltz

4

Pat Hentgen

2002

Curt Schilling

2

Randy Johnson

2004

Randy Johnson

2

Johan Santana

2005

Johan Santana

3

Roger Clemens

2014

Clayton Kershaw

2

Corey Kluber

2015

Clayton Kershaw

3

Zack Greinke

2016

Clayton Kershaw*

3

Justin Verlander

2018

Jacob deGrom

2

Aaron Nola

2019

Gerrit Cole

5

Mike Minor

2021

Corbin Burnes

10

Zack Wheeler

That’s 25 years in which bWAR was the lone holdout (depending on how you look at 2016, in which Kershaw was tied for first in fWAR with Jose Fernandez). In twelve of them, the fWAR/GSDev choice ranked second, which is generally understandable. Throw in another six third-place finishes, which usually aren’t too out there. (Even the near misses are amusing at times, like with the Alexander/Johnson switcheroo in 1915-16, or the Schilling-Johnson-Santana-Clemens ring-around-the-rosie in the early 2000s.)

There are some oddballs, though. We’ve talked about 2019 Gerrit Cole and 2021 Corbin Burnes during the early posts in this series; those were two of the bigger bWAR discrepancies of the bunch. As far as the other fourth-and-below finishes go: Smoltz in ’96 is a fairly standard bWAR/fWAR tiff (FIP lower than ERA despite a good-fielding team), with a side of brilliant postseason work. Despite finishing in fifth place, Clemens in ’88 is a surprisingly close contender, less than a win back of first. Gooden in ’84 had a shocking-for-the-time 11.4 K/9 in his rookie year, thereby producing one of the best FIP seasons ever; his ERA was nearly a run higher.

Richard in ’79 surprises me. Yes, his FIP is lower than his ERA by half a run – but he still led the majors in ERA. Houston was a pitcher’s park, but not a crazy one (Richard’s personal park factor for the year was 93, according to B-R, higher than any he’d had since 1975). bWAR thinks the Astros’ defense was good that year, but Richard’s BABIP allowed was barely different from average (and that difference probably came from the park factor). I can understand not having him in first, but #8 (and nearly 2 wins out of first) is a leap.

And then there’s the big one. Sandy Koufax 1965, the #4 season ever by GSDev, and #7 in the sample according to fWAR, ranks fourth for the year in bWAR. Admittedly, his margin behind Sam McDowell and Jim Maloney is miniscule. But Juan Marichal bests him by 10.3 bWAR to 8.1. Marichal’s 1965 is quite a season in its own right, one of the top 60 of all time by GSDev. Koufax won the ERA title, but Marichal only trailed him by 0.09, and was in a tougher park; he led the majors in ERA+. However, Koufax threw 40 more innings (not even counting the playoffs). You’d think they’d at least be close. The difference? According to bWAR, the Giants had an average defense, while LA’s was phenomenal. This was not typical; for most of Koufax’s best years, the Dodger fielders grade as average or below. From 1961-66, Koufax’s fielding adjustment per 9 innings in bWAR goes: -0.20, -0.07, -0.15, 0.03, 0.30, -0.07. Now, if you look at Koufax’s BABIP in ’65, you might buy it; his .238 mark was 40 points better than league average. Marichal’s BABIP, by contrast, was .238. Which is not actually a contrast, given that it’s the same number.

This is all having the discussion on bWAR’s terms; we’re ignoring Koufax’s 382 strikeouts, 1.93 FIP, and phenomenal World Series performance. Even looking at the things that bWAR looks at, I don’t agree that there are two wins of margin in Marichal’s favor here.

Having dumped on one WAR system, let’s switch to the other! Here are the 20 seasons in which fWAR is the disagreeable option:

Year

bWAR/GSDev Choice

fWAR Rnk

fWAR Choice

1914

Walter Johnson

2

Cy Falkenberg

1917

Eddie Cicotte

3

Grover Alexander

1921

Red Faber

3

Stan Coveleski

1922

Red Faber

2

Urban Shocker

1926

George Uhle

2

Lefty Grove

1933

Carl Hubbell

2

Dizzy Dean

1940

Bob Feller

2

Ray Brown

1952

Bobby Shantz

2

Robin Roberts

1955

Billy Pierce

2

Bob Rush

1967

Jim Bunning

2

Dean Chance

1969

Bob Gibson

2

Sam McDowell

1973

Tom Seaver

4

Bert Blyleven

1975

Jim Palmer

4

Tom Seaver

1992

Greg Maddux

3

Roger Clemens

1993

Kevin Appier

4

Greg Maddux

1995

Greg Maddux

2

Randy Johnson

2000

Pedro Martinez

2

Randy Johnson

2010

Roy Halladay

4

Cliff Lee

2017

Corey Kluber

2

Chris Sale

2022

Sandy Alcantara

4

Aaron Nola

The fWAR table looks a bit more under control than the bWAR table; none of the consensus top finishers from the other two systems land outside the top 4, and 12 of the 20 finish #2. Of those, the obvious one to highlight is Unit over Pedro in 2000, a year in which Martinez had the single highest GSDev score to date. Johnson had more innings, but not by a huge margin (248.2 to 217), and Pedro led the majors in FIP and a number of related categories. Johnson’s year was excellent as well (he led the NL in FIP and ERA+, and the majors in strikeouts and strikeout rate), but Pedro beat him by 36 points in FIP despite being in the DH league. To put it mildly, this outcome is a surprise for me.

The other standout seasons are the years in which the fWAR fourth-place finisher leads in the other two systems. There’s not necessarily going to be much to say here in most cases. Take 1973, Seaver vs. Blyleven. Seaver has a better ERA (and allowed just 7 unearned runs to Blyleven’s 18), and pitched extremely well in the playoffs. Blyleven had a better FIP and pitched 35 more innings in the regular season (Seaver’s postseason makes up most of that difference as well). Seaver vs. Palmer two years later puts Tom Terrific on the other side of the exchange; Palmer had a 2.09 ERA but a 2.96 FIP that year. Appier vs. Maddux? Appier’s RA was lower despite being in the DH league; Maddux’s FIP was lower and he pitched more innings. (Maddux also struggled in the postseason that year, which doesn’t interest fWAR at all.) Halladay/Lee and Alcantara/Nola are more of the same, with FIP/ERA differences in opposite directions. The 2010 Halladay/Lee race was incredibly close in GSDev, as were some of the others listed here (Cicotte/Alexander in 1917, Maddux/Clemens in 1992).

Speaking of close races, we should mention Shantz vs. Roberts in 1952. GSDev has the margin between those two pitchers as 0.01, the narrowest in any season. 1952 was roughly one wild pitch away from being in the bWAR table instead of this one. (As it happens, Shantz threw 0 wild pitches in 1952, and Roberts threw 2; that was enough to swing the results.)

On to our final group for this post: years in which bWAR and fWAR agree on the best pitcher, and GSDev dissents.

Year

bWAR/fWAR Choice

GSDev Rnk

GSDev Choice

1904

Rube Waddell

2

Jack Chesbro

1907

Christy Mathewson

2

Cy Young

1908

Christy Mathewson

2

Ed Walsh

1925

Bullet Rogan

NA

Dazzy Vance

1935

Lefty Grove

2

Cy Blanton

1937

Lefty Grove

2

Lefty Gomez

1946

Bob Feller

2

Hal Newhouser

1964

Dean Chance

2

Don Drysdale

2011

Roy Halladay

2

Justin Verlander

2020

Shane Bieber

2

Trevor Bauer

The obvious standout here is Negro League great Bullet Rogan in 1925, who I don’t have the data to rank via GSDev. Given the agreement between bWAR and fWAR on Rogan’s prowess and the fact that neither Vance nor anyone else in MLB had a particularly exceptional year, I’m reasonably confident that Rogan should indeed be in the #1 spot.

Outside of that, there are nine seasons in which both WAR systems agree on the best pitcher and GSDev does not. In all nine cases, GSDev ranks the WAR consensus in the #2 spot. That is both fewer and smaller discrepancies than have arisen from the other two systems – but that makes sense, because GSDev is to some extent a compromise between the two WARs, accounting for factors used by both of them. So what causes the differences?

The most obvious option is one we’ve brought up frequently throughout the series: the postseason, which GSDev counts and both WARs ignore. This is not as much of a factor in these particular years as might be expected, for a straightforward reason: of the 18 pitchers in question, only 5 reached the postseason. Those were Gomez in 1937, Verlander and Halladay in 2011, and Bauer and Bieber in 2020. Working in reverse order: 2020 was definitely decided by the postseason; Bieber had his worst start of the year in his only playoff outing, while Bauer’s lone playoff start was his best of the shortened campaign. 2011 has the opposite effect; Halladay’s two excellent postseason starts brought him up quite a bit, while Verlander’s four mediocre outings didn’t move his score at all. Meanwhile, in ’37, Gomez does lose two very good World Series starts if you ignore the playoffs… and his margin over Grove is so big that he easily wins anyway.

So, we’ve accounted for one season out of nine. Let’s run through the others chronologically.

1904, Chesbro vs. Waddell. For two deadball aces, you’d think relief work would be a factor; it is, but only to a small extent. Waddell had no relief outings this year; Chesbro had four (allowing 6 runs in 10.2 innings), compared to 51 starts (48 of which he completed). Waddell had 349 strikeouts, a total that nobody would match for over 60 years, but he also allowed more walks, homers, and hit batters than Chesbro in 60 fewer innings as a starter. Waddell’s per-inning rates of runs and hits allowed were also higher. Chesbro’s sets of parks and opponents were a bit tougher, and he recorded about one extra out per start. The extra 100-plus strikeouts (in five fewer starts) are a big deficit to make up; GSDev thinks Chesbro does enough.

1907, Young vs. Mathewson. These two had a bit more relief work – but Young, GSDev’s choice as the superior starter, also pitched better in relief. Mathewson holds a slight lead in raw Game Score, 71.9 to 71.3, but he was in the lower-scoring NL, and avoided facing the second highest-scoring team in the league (because he played for them). Young’s Boston Americans, meanwhile, sported the AL’s feeblest lineup. The environmental adjustment boosts Young’s per-game numbers above Matty’s, and he had an extra start on top of that.

1908, Walsh vs. Mathewson. I promise GSDev doesn’t hate Christy Mathewson; it does give him a couple of #1 finishes that we’ll discuss in the next post. For now, though, we finally get some relevant bullpen work to examine. This time, it very well may make a substantive difference, as Mathewson’s 12 relief appearances added up to 28 innings and only 5 runs allowed, while Walsh’s 17 bullpen efforts comprised 38 innings and 18 runs. Removing his comparatively ineffective relief outings nudges Walsh’s RA as a starter just below Mathewson’s. Matty allowed fewer hits and walks per inning; he gave up 5 homers to Walsh’s 1, but Walsh allowed far more hit batters and wild pitches. And this time, it’s Walsh who had the more pitcher-friendly environment. The difference between them on a per-start basis is razor-thin… but Walsh had five more starts, and thus pulls ahead.

1935, Blanton vs. Grove. Relief is a factor here as well; the pitchers made five relief outings each, but Blanton allowed six runs in his 9.1 bullpen innings to Grove’s three in 12. The rest of the difference is hard to assess, largely because Grove and Blanton pitched in different leagues, and there was a large, persistent difference between the leagues in the ‘30s, with the AL being notably higher-scoring every year. GSDev adjusts for this, of course; Grove’s average park-opponent combo was 0.41 runs per game higher than Blanton’s. That adjustment makes up nearly 60% of the difference in their raw per-game numbers, but that still leaves Blanton ahead. It may genuinely be the relief outings making up the whole difference here.

1937, Gomez vs. Grove. Most of these matchups are pretty close, as you might expect from seasons in which GSDev disagrees with a WAR consensus. This one? GSDev has this as a blowout. So what’s up? Well, first things first; neither pitcher made a relief appearance all year. Gomez had the better ERA (2.33 to 3.02) and FIP (3.29 to 3.44), leading the league in both categories. He also led the league in hits per 9, strikeouts, strikeouts per 9, strikeout to walk ratio, and shutouts. He had two more regular season starts than Grove, and their average innings per start were nearly identical. Grove was in tougher parks, and Gomez ducked the supremely excellent Yankee lineup; accounting for this only narrows his Game Score advantage from 5.9 to 3.5. And that’s before mentioning Gomez’s two complete game wins in the World Series. Sometimes it’s a challenge to figure out what GSDev is thinking; this is not one of those times.

1946, Newhouser vs. Feller. This is a close race in all three systems. It’s also relatively easy to explain. Newhouser was a better pitcher on a per-game basis; he led the AL in both ERA and FIP despite working in a hitters’ park while Feller benefited from the best pitchers’ park in the game. Feller made 8 more starts, totaling 70 additional innings. Both pitchers fare spectacularly well in all three systems, but GSDev narrowly prefers Newhouser’s per-game excellence, while WAR leans toward Feller’s higher volume.

1964, Drysdale vs. Chance. In the season that inverts the last two digits of 1946, the race’s outcome is also inverted. This time, it’s Chance who was better on a per-game basis, and Drysdale who had the extra starts. GSDev picking Drysdale honestly surprises me. Chance’s ERA and FIP were both better, despite being in a similar park in the higher-scoring league. Chance had 11 relief outings, but pitched better as a starter than as a reliever. The other little things that might result in a swing (unearned runs, wild pitches, hit batters) are all in Chance’s favor as well. The only thing keeping Drysdale’s per-start numbers close are a higher inning count (8 innings per start to 7.3) and a lower walk rate (1.7 per start to 2.2). Ultimately, Chance leads Drysdale in average adjusted Game Score, 64.9 to 64.1, but that margin is close enough for Drysdale’s 40-35 lead in starts to make the difference. (As a bonus note, Sandy Koufax blew past both of them in average Game Score, with a 67.3, but in only 28 starts.)

2011, Verlander vs. Halladay. If this was regular-season only, Verlander would win easily; his average adjusted Game Score was higher and he had two additional starts. This lead was likely because of something that has been mentioned a couple of times so far without really being highlighted: hits allowed. GSDev is the only one of these systems that considers hits allowed as a factor. Even including the postseason (in which Halladay pitched well and Verlander struggled somewhat, although he did extend his advantage in starts), Verlander allowed 6.4 hits per 9 innings to Halladay’s 7.8. That was enough to keep pace with Halladay’s superior fielding-independent numbers, which in turn allowed the extra starts to make the difference.

And that’s all ten of GSDev’s dissents explored. Next time, we’ll go through the remaining category: the 21 seasons in which GSDev, bWAR, and fWAR have all agreed to disagree on the identity of baseball’s best pitcher.

No comments:

Post a Comment