Monday, November 24, 2025

Starting Pitcher Ratings: Top 100 Careers, 100-51

Having reviewed the existing pitcher ratings, talked about Game Score and its adjustments as the basis for a new rating system, introduced the concept behind Game Score Deviations and then the results of the metric itself, and finally explored what a career rating system might look like, it’s time for the fun stuff. Today, we start going through the top 100 starting pitchers of all time.

(Well, of 1901-2022, because those are the years for which I have completed scores. And as has been previously noted, Baseball Reference doesn’t have game-by-game Negro League data, and therefore neither do I. But “top 100 of all time” sounds better.)

As a reminder from the post on career evaluation, we’re looking at a hybrid of two different GSDev-based approaches. First, add up the squares of all of the pitcher’s positive GSDev totals and take the square root of the sum (root sum square, or RSS, for short). Second, a peak-weighted sum (100% weight for the pitcher’s best season, 95% for second best, and so on), with the result multiplied by 0.39 to scale down to the same range as the RSS values. The pitchers will be ranked by the average of those two numbers, which will be listed as “Combo.”

To get a sense of how GSDev fares against conventional statistical wisdom, I also pulled bWAR totals for the same 1901-22 period (filtering out relievers to even out the rankings), and well be comparing the bWAR ranks of our top 100 to their GSDev ranks. There are some pretty noteworthy differences, which we’ll explore as we go. But for now, let’s get to the tables, starting with the bottom 10 of the top 100:

Rank

Pitcher

Years

RSS

WtSum

Combo

bWAR Rnk

100

Madison Bumgarner

2009-22

30.30

77.2

30.20

206

99

Paul Derringer

1931-45

29.81

78.6

30.24

149

98

Corey Kluber

2012-22

32.29

73.3

30.43

187

97

Josh Beckett

2001-14

30.69

77.8

30.51

179

96

Jimmy Key

1985-98

30.43

79.2

30.66

89

95

John Lackey

2002-17

30.28

79.7

30.68

158

94

Catfish Hunter

1965-79

30.97

78.4

30.77

174

93

Mickey Lolich

1963-79

30.59

80.2

30.93

95

92

Mark Buehrle

2000-15

30.51

80.8

31.01

51

91

Frank Viola

1982-96

31.38

79.1

31.12

99

When going through ranking lists like this, I can always find something to talk about with a little effort. But sometimes, the numbers make it much easier by providing a narratively perfect result. One such result is the #100 ranking of Madison Bumgarner, who was a good pitcher in the regular season but has a strong argument as the best postseason starting pitcher ever. The numbers are not actually saying “well, if the postseason counts, we should probably include the guy who has the best postseason record, so let’s squeeze him on at the end of the list.” But it kind of feels that way.

Bumgarner is one of the five pitchers from this group who have debuted since 2000. Four of the five are rated at #158 or lower by bWAR. By contrast, Mark Buehrle is bWAR’s #51 pitcher, by far the highest ranking in this group. The difference largely comes down to the peak adjustment, which I’m using for GSDev but not for bWAR. Buehrle has nearly double Kluber’s WAR total (60-34), but Kluber’s peak is significantly higher in both systems (as you might guess from his having more Cy Young awards than Buehrle has seasons in which he received even a single Cy Young vote). Peak adjustment will remain a complicating factor in the ranking comparisons as we move forward.

Moving a bit further back in time, let’s talk Catfish Hunter, whose ranking is guaranteed to make everyone unhappy. WAR is generally not a fan of Hunter, who benefited from playing in pitcher-friendly parks for great teams. However, he had both a fairly high peak and a good deal of success in the playoffs, and his best playoff performances tended to follow his best regular seasons (1972 and ’74 in particular). That being said, 91-100 is generally not Hall of Fame territory; there are fewer than 90 total pitchers in the Hall, and that group includes relievers, Negro Leaguers, and pre-1900 pitchers, all of whom are omitted here. Hunter is immediately behind Mickey Lolich, an exact contemporary who is preferred by both major WAR systems but who also got nowhere near joining Hunter in the Hall of Fame (despite having a pretty remarkable postseason record in his own right). So whether you’re a Hunter fan or a WAR disciple, you can find something to complain about here.

Onward and slightly upward! Let’s move on to numbers 90-81:

Rank

Pitcher

Years

RSS

WtSum

Combo

bWAR Rnk

90

David Wells

1987-2007

30.70

81.4

31.21

68

89

Babe Adams

1906-25

30.86

81.2

31.26

83

88

Bartolo Colon

1997-2018

30.68

81.6

31.26

93

87

Fernando Valenzuela

1981-97

31.80

79.1

31.33

168

86

Bucky Walters

1934-48

32.00

80.5

31.69

106

85

Stan Coveleski

1912-28

31.86

80.9

31.71

34

84

Gerrit Cole

2013-22

32.69

79.6

31.87

205

83

Mark Langston

1984-99

31.90

82.2

31.98

84

82

David Price

2008-21

32.09

81.9

32.01

143

81

Chuck Finley

1987-2002

31.71

83.7

32.17

54

We’ve talked a few times about how GSDev’s interpretation of the early 1900s is far less rosy than bWAR’s. Coveleski is our first encounter with this effect in the career rankings. He was a fine pitcher, scoring as the best in baseball in 1920 and adding two ERA titles outside of that season. GSDev gives him seven seasons graded as an ace, and three in the top quartile (top 4, in his era). But his career isn’t especially long by top-100 standards (11 seasons of 20-plus starts), and like many pre-integration pitchers, his large WAR totals tended to be built on lots of innings rather than game-to-game excellence. The results offer a direct comparison to his contemporary Babe Adams, who does not have the same ranking disparity between WAR and GSDev. Adams tended to miss more time in-season (Coveleski’s fourth-highest innings total exceeded Adams’s second-highest), and counteracted that deficit by being a little better per-game (his career average adjusted Game Score is a point higher, driven by a strikeout-to-walk ratio that nearly doubles Coveleski’s), and that is a combination that GSDev can get on board with.

On a related but opposite note, Fernando Valenzuela is our first representative who peaked in the late ‘70s or early ‘80s, the period with the lowest measured scrub deviations to date. He won’t be the last. Valenzuela had just five ace-level seasons (only two pitchers in the top 100 had fewer), but all five of them were top quartile. He also had an exemplary playoff resume – 8 starts, 63 innings, 14 runs allowed (2.00 ERA). Five of those starts came in 1981, pushing him to 30 starts and 233 total innings for the year; just looking at his numbers, you’d never know it was a strike-shortened season.

Bonus note before we get to the obvious guy: Mark Langston and Chuck Finley shared a rotation in Anaheim for eight years, and end up within a fraction of each other in the rankings, which is fun.

All right, time to talk about Gerrit Cole. By arithmetic difference in ranking (+121), Cole is the pitcher that GSDev thinks bWAR underrates the most of anyone in either system’s top 100. If you take the difference in square roots instead, to account for the fact that the (say) gap from 100 to 80 is very different from 25 to 5, the biggest disagreement is… still Gerrit Cole.

Cole’s career through 2022 is pretty peak-heavy; you can see that by looking at his RSS score, which is the highest we’ve seen from 81-100 (RSS is more peak friendly than the weighted sum in most cases). This makes sense, because Cole’s 2019 has the highest single-season GSDev we’ve encountered on the list so far, and I believe it will only be surpassed by one other entrant over the duration of the 100-51 segment of the rankings. And he complements that with five other ace seasons, three of which placed in their seasonal top 10s. (If #29 all-time seems high for Cole’s 2019, a year in which he did not win the Cy Young, don’t forget our old friend the postseason: five starts, 4-1, 1.72 ERA in 36.2 innings, 47 strikeouts to 11 walks. Very nice addition to an already outstanding regular season.)

One more Cole-related note: By sheer coincidence, 2022 appears to be a very nice cutoff point for a top-100 ranking of starting pitchers. This is true for two reasons. First, there weren’t many pitchers active in 2022 who were on the verge of cracking the list; best I can tell with the preliminary numbers I have so far, nobody looks likely to join the top 100 when 2023 and 2024 are added. And second, of the pitchers active in ’22 who were already in the top 100, only two of them appear to have improved their standing by more than a few places over the subsequent two seasons. One of those is Cole, who finally earned his long-awaited Cy Young in 2023. We’ll encounter the other shortly.

On to the next group, 80 to 71:

Rank

Pitcher

Years

RSS

WtSum

Combo

bWAR Rnk

80

Tommy Bridges

1930-46

32.03

83.3

32.25

75

79

Tommy John

1963-89

32.19

83.5

32.38

43

78

Javier Vazquez

1998-2011

32.46

83.2

32.46

126

77

Eppa Rixey

1912-33

31.99

85.4

32.65

57

76

Lefty Gomez

1930-43

33.62

81.9

32.78

128

75

Jacob deGrom

2014-22

33.81

81.5

32.79

140

74

Red Faber

1914-33

32.43

85.2

32.83

33

73

Frank Tanana

1973-93

32.90

85.6

33.14

58

72

Early Wynn

1939-63

32.74

86.7

33.27

76

71

Vida Blue

1969-86

33.18

86.1

33.38

117

Javier Vazquez grades out as a top-80 pitcher since 1901 by GSDev; bWAR disagrees but still has him in the top 150. Vazquez last pitched in 2011, so his first eligibility for the Hall of Fame would have been in 2017. He was not offered as an option on the ballot.

I’m not saying I’d vote for Vazquez for the Hall of Fame. But being included on the ballot is practically guaranteed if you have at least 10 years in the majors (Vazquez, as you can see above, had 14). The 2017 ballot featured guys like Arthur Rhodes, a reliever with 33 career saves (and 15 career WAR), and Matt Stairs, a pretty good hitter who first exceeded 100 at-bats at age 28, and whose teams kept trying to play him in the field even though he was the personification of the DH role (the fielding kept him down to 14 career WAR). But not Javier Vazquez. He is probably the biggest omission from the Hall of Fame ballot since it has assumed its current form.

Outside of Vazquez (and Jacob deGrom, whose two Cy Young awards assure that Cooperstown voters will at least get to consider him when his time comes), this is predominantly a list of old-timers; four of the ten pitchers in this group retired before 1950. As such, it’s probably time to discuss how pitcher usage has changed over time, and the effect those changes have on the numbers.

There are two major shifts in usage of ace pitchers over the past century. First, starters are no longer expected to throw large numbers of complete games. In 1917, for instance, about 55% of all starts were completed. No single pitcher has completed half of his starts in a season (with a 10-start minimum) since 1988. Five individual pitchers in 1917 had at least 29 complete games; MLB as a whole in 2024 combined for 28.

Second, deadball aces often pitched in relief between starts. To go back to our modern pitchers from this set, Vazquez had seven relief outings in his career (plus two in the playoffs); through 2025, deGrom has none. By comparison, Red Faber had at least 19 starts in each of his first four MLB seasons, but also had 10 or more relief outings in all four of those years. Rixey had three seasons with similar totals of starts and relief outings. Gomez and Bridges relieved less (although still intermittently), but even Gomez (who had the fewest relief outings of our four old-timers in this set) still came out of the bullpen 48 times in his career, nearly all of them in seasons spent as a majority starting pitcher. This captures an overall league-wide trend. If I count correctly, there are nine seasons since 1901 in which the same pitcher led his league in complete games and saves; seven of these came in 1910 or earlier, and the last of them was in 1936 (Dizzy Dean).

This combination resulted in pitchers compiling workloads that would be inconceivable today. In the first 20 years of the AL/NL era, only one of the 42 league seasons (including the two Federal League years) didn’t have a single pitcher who threw 300 innings, and even that single instance wouldn’t have happened without a world war (Hippo Vaughn’s 290.1 innings led the abbreviated 1918 NL campaign). By contrast, the last 300-inning regular season came in 1980, and no pitcher has managed a seasonal total as high as 240 in over a decade. Of the four pre-integration pitchers in this group, probably the least durable was Tommy Bridges, who never led the league in innings during his career. Bridges still had five consecutive seasons (1933-37) in which any of his inning totals would have led the majors in every season since 2015.

What effect do these changes have on the rankings? The first is obvious; GSDev looks at starts only, so the pitchers who did significant relief work are having part of their value ignored. Red Faber pitched about 11% of his total innings as a reliever; he’s probably not underrated by the full 11%, but if you want to push him upward by half of that amount, I won’t object. The additional innings pitched also have an indirect effect on how the system looks at the old-timers, because in order to stay fresh enough to throw 300 innings in a season, the aces of a century ago would intentionally take it easy until they got into trouble. We know this because the pitchers themselves tell us – Christy Mathewson’s autobiography, Pitching in a Pinch, is named after this common practice. This means the pitchers were sacrificing some amount of their batter-by-batter performance for the sake of higher volume. And since GSDev doesn’t prioritize volume as heavily as WAR does, it will punish pitchers who make this tradeoff.

This is a very small part of a larger discussion of how to evaluate pitchers across eras. The changes in baseball over the last century-plus are numerous and diverse, and accounting for all of them is near-impossible. Ultimately, while the above points about the treatment of older pitchers carry some weight, I think they are more than counterbalanced by the expansion in both the league and the talent pool over the timespan we’re examining here. As such, I think GSDev’s mildly modernist leanings in the rankings are likely appropriate (even though they weren’t a designed feature).

That’s enough about older pitchers for now; let’s move on to the next group:

Rank

Pitcher

Years

RSS

WtSum

Combo

bWAR Rnk

70

Mordecai Brown

1903-16

33.39

85.6

33.38

60

69

Adam Wainwright

2007-22

33.10

86.5

33.42

131

68

Rick Reuschel

1972-91

32.81

87.3

33.43

31

67

Roy Oswalt

2001-13

33.60

85.4

33.46

87

66

Dennis Martinez

1976-98

32.82

87.7

33.51

88

65

Chris Sale

2012-22

34.61

83.3

33.54

111

64

Orel Hershiser

1984-2000

33.35

87.5

33.74

77

63

Jack Morris

1977-94

33.56

89.0

34.14

123

62

Cliff Lee

2002-14

34.78

86.0

34.16

134

61

Jon Lester

2006-21

33.95

89.1

34.34

124

Did I say that was enough about older pitchers? Of the nine seasons mentioned above in which a pitcher led his league in both complete games and saves, Mordecai Brown had two of them (1909-10).

All right, elephant in the room time. Jack Morris is significantly higher in these rankings than would likely be expected from a stats-based method. We’ve talked about how the late ‘70s and early ‘80s were a low-deviation period, and Morris isn’t the only pitcher who benefits from that (Vida Blue and Fernando Valenzuela have also significantly outperformed their WAR rankings). But then how on earth is Morris ahead of Rick Reuschel, a relative contemporary who WAR ranks 90-plus spots in front of him?

If you rank Morris and Reuschel’s best seasons by bWAR, Reuschel has six of the top seven. Switching to fWAR doesn’t change much; Reuschel pulls five of the top six, although the margins are narrower. GSDev, meanwhile, gives Reuschel two of the top three… but Morris seven of the top ten.

The difference seems to come down to three factors. First, there’s fielding evaluation in peak seasons. GSDev lists 1983 as easily Morris’s best year, at 12.39. You can see the appeal; he led the AL in innings and strikeouts, finished second in strikeout-to-walk ratio, and cracked the top 10 in ERA and FIP (the latter number being a career-best 3.38). Detroit’s fielders graded out well in ’83 (0.47 runs per 9 above average), but Morris seems to have benefited less than most of his teammates (both by FIP-ERA disparity and by BABIP). While GSDev ranks Morris’s ’83 as the fourth-best season in the majors that year, bWAR has him tied for #21. On the other hand, Reuschel’s best season was 1977, and his disparity runs the other way, with fielders 0.27 runs per 9 below average, but an ERA 0.24 runs better than his already-excellent FIP. As a result, bWAR sees Reuschel as the best pitcher in baseball in ’77, while GSDev places him at #10. In Morris’s case, this type of disparity persists through his career, and as a result, fWAR has him roughly a dozen wins higher than bWAR (albeit still well behind Reuschel).

Second, there’s in-game durability. Morris and Reuschel made nearly identical numbers of regular season starts in their careers (527 and 529, respectively), but Morris threw about 275 more innings, averaging 7.1 innings per start to Reuschel’s 6.6. This serves as a counterbalance to Reuschel’s allowing fewer runs (even when controlling for league context). Their career average adjusted Game Scores were 53.0 for Morris and 53.5 for Reuschel, and about half of Reuschel’s advantage is accounted for by their slightly different league contexts.

The third factor is, as always, differences in league deviation. This is a bigger factor than you might expect, since Morris and Reuschel’s careers overlapped heavily. But Reuschel missed a lot of time in the early ’80s (18 combined starts from 1982-84), and thus didn’t benefit from as many low-deviation seasons. Also, the bits of their careers that didn’t overlap saw Reuschel in the fairly high-deviation early ‘70s, and Morris in the comparatively normal early ‘90s.

Side note – if you were expecting me to bring up Morris’s postseason success, yes, that probably is a small factor, especially because Reuschel struggled in the playoffs. But ultimately the two of them combined for 20 postseason starts compared to 1056 in the regular season, and while Morris certainly had his moments, he also had a couple of less-impressive playoff outings. I’m confident they would still be effectively tied on regular season record alone.

That’s quite a bit of commentary without mentioning the five post-2000 pitchers in this group, only one of whom is top-100 by bWAR. But I’m not sure there’s much to say beyond another refrain of the song we’ve been singing for this entire post; GSDev treats the lower volume of the modern pitching season more gently than WAR does. Also, as promised, this is where we find the other pitcher who’s improved his position by a substantial margin since 2022. Unlike Gerrit Cole, whose 2023 Cy Young felt like the natural conclusion to a long stretch of excellence, Chris Sale’s 2024 Cy was thoroughly unexpected, coming off of half a dozen injury-plagued years. But expected or no, Sale’s recent efforts should give him quite a boost once 2023-24 numbers are finalized.

Let’s finish up this extra-long post with one more group, 60-51:

Rank

Pitcher

Years

RSS

WtSum

Combo

bWAR Rnk

60

Billy Pierce

1948-64

34.04

89.5

34.46

70

59

Jerry Koosman

1967-85

33.91

90.6

34.63

59

58

Kevin Appier

1989-2004

34.78

89.3

34.81

66

57

Ted Lyons

1923-46

34.28

91.0

34.88

35

56

Andy Pettitte

1995-2013

34.51

90.6

34.93

48

55

Ron Guidry

1975-88

35.42

88.4

34.95

96

54

Eddie Plank

1901-17

34.65

91.0

35.07

13

53

Steve Rogers

1973-85

35.32

89.7

35.16

114

52

Whitey Ford

1950-67

34.78

92.1

35.34

69

51

Cole Hamels

2006-20

35.13

92.4

35.59

55

OK, we just did a detailed one-to-one comparison, so I’ll spare you Steve Rogers vs. Eddie Plank, which mostly boils down to Plank’s low peak (11 ace seasons but only 3 in the top half) and the difference in deviation between their eras, which we’ve gone through quite a few times already. It is notable that Plank is the only member of the bWAR top 30 to fall short of the GSDev top 50, and his bWAR rank of #13 clears that top-30 bar with room to spare. Meanwhile, in a mild spoiler, Rogers is the highest-ranked pitcher in GSDev to miss the bWAR top 100.

Having touched on the two pitchers whose evaluations diverge wildly between systems, it’s worth pointing out that they are the exception for this group of pitchers; half of this set of ten have GSDev and bWAR rankings within 10 places of each other. That includes Jerry Koosman ranking exactly at #59 in both systems, one of two pitchers in the top 100 of whom that is true (we’ll see the other one higher up).

That takes us through the bottom half of the top 100. The scores look admittedly close together throughout the whole collection, with Hamels’s margin over Bumgarner (35.59 to 30.20) being less than 20%. So let’s close out by going back to seasonal ordinal rankings for another comparison. Here are each of our five groups of 10, listed by ace seasons, top-half and top-quartile ace seasons, and #1 seasons:

Group

Ace

THA

TQA

No1

100-91

54

33

17

2

90-81

68

41

21

3

80-71

60

37

24

5

70-61

71

53

22

2

60-51

81

45

20

4

If you look from one group to the next, the difference isn’t always immediately obvious. But if you contrast the groups on either end, it clarifies quite a bit more. You can also match up pairs of contemporaries between the end groups (Lyons vs. Derringer, Pettitte vs. Beckett, Koosman vs. Lolich, Hamels vs. Bumgarner), and the advantages are tough to argue with in each case.

Still, though, we have yet another reminder of the relatively small differences you tend to find in the back half of a top-100 list. Next time, we’ll cover pitchers 50 through 11 and see if we can’t find a bit more breathing room between some all-time greats.

No comments:

Post a Comment