Monday, November 3, 2025

Starting pitcher ratings: Career evaluation

So far in this series, we’ve reviewed the existing pitcher rating systems, introduced and refined Game Score as the basis for an alternative method, introduced deviations in Game Score as the structure of a new metric, and examined the single-season results of Game Score Deviations (GSDev), first laying out the scale and then running through the top 100 single-season scores from 1901-2022. Now, it’s time to start working on what most of you have probably been looking forward to all along: career evaluations.

The first thing I tried in forming a career ranking was simply adding up the single-season GSDev numbers and seeing who has the highest total. Appealing as the simple option is, in the case of GSDev it doesn’t really work. Unlike with WAR, the metric doesn’t function in an additive way (for instance, a player traded midseason will have a combined GSDev for the year that doesn’t equal the sum of his GSDev scores for each team), and mediocre seasons score too high in comparison to very good years to make straightforward addition a viable option. If you take, say, three of Hipolito Pichardo’s 1993 season (7-7, 4.00 ERA, 3.7 K/9 in 155.1 innings as a starter; 5.35 GSDev), you would exceed the GSDev total of his teammate Kevin Appier (18-8, 2.56 ERA, 2.90 FIP, 238.2 innings, 15.89 GSDev - one of the top 100 scores of all time). I suspect you could not have gotten Appier in trade for three Pichardos that year. As such, adding up raw GSDev scores gives the kind of results you might expect from a system that proportionately overvalues mediocre seasons, things like Don Sutton (with his long career and relatively low peak) on the edge of the top 10.

If the scaling doesn’t work when adding up the basic scores, what if we don’t use the raw numbers at all? Why not just use ordinal seasonal rankings? There’s some mileage to be gained here, particularly if you scale for league size; I’m particularly fond of designating pitchers as “ace” (top N pitchers of the year), “top-half ace” (N/2), or “top-quartile ace” (N/4), where N is the league size. Only six pitchers had at least 15 ace seasons: alphabetically, Grover Cleveland Alexander, Roger Clemens, Walter Johnson, Nolan Ryan, Tom Seaver, and Warren Spahn. Even fewer had at least 10 top-quartile ace years: Clemens, Lefty Grove, and Randy and Walter Johnson. Those are pretty reasonable lists of great pitchers.

But much as I like toying with ordinal rankings, ultimately not all #1 seasons are the same, nor are all #5 or #10 or #N/2 efforts. Going back to the top 100 seasons list, there were 28 years from 1901-2022 that included multiple top-100 efforts, making up over 60% of the list, while 56 different seasons didn’t include a single top-100 entry. This could vary wildly even over a brief timespan; from 1966-71, MLB produced 2, 0, 2, 1, 0, and 2 top-100 seasons, respectively. This remains true if you go further down the all-time seasons the list. While 1939 has two top-100 seasons, the preceding year produced no seasons in the top 500. Equating Bob Feller’s 1939 (16.92 GSDev) to Red Ruffing’s 1938 (12.20) just because they both led the league is unsatisfying at best, and disingenuous at worst. (As a bonus note, Ruffing’s 1939 score was also 12.20, coming in ahead of 1938 by a tiny fraction. He finished seventh in MLB that season.)

So what are we to do with the seasonal numbers if raw addition doesn’t work? I can think of two options. (Well, I can think of a lot more than two, but I’m going to introduce the two that I like the most.) But in order to introduce them, I should probably pick a pitcher to use as an example. Let’s take someone inoffensive – say, a pitcher who was often an ace (8 times) and a top-half ace (7), but rarely made the top quartile (once, with a fourth-place finish). Someone with a fairly long career (between 500 and 600 starts), not a ton of regular season awards but some postseason success. Someone who’s been retired for around 30 years, so people are reasonably likely to remember him, but presumably any controversy associated with him has long blown over. Jack Morris, come on down!

Here are Morris’s GSDev totals and yearly ranks in their raw form:

Year

Starts

Adj GS2

GSDev

Rnk

1977

6

52.4

2.86

100

1978

7

40.2

-0.88

207

1979

27

57.9

10.66

9

1980

36

51.4

7.10

42

1981

25

58.4

10.64

8

1982

37

50.4

6.65

46

1983

37

58.7

12.71

4

1984

38

54.3

9.24

18

1985

35

57.6

10.95

11

1986

35

57.7

10.86

10

1987

35

56.6

10.58

13

1988

34

51.5

6.09

53

1989

24

45.2

1.97

117

1990

36

49.0

5.15

67

1991

40

56.7

10.79

10

1992

38

51.6

6.97

42

1993

27

41.9

-0.14

220

1994

23

47.4

2.78

93

Tot/Avg

540

53.0

124.96

 

Morris was often very good, but rarely great; his best season (1983) barely cracks the top 400 all-time (to be specific, it’s #396), and you have to go another 400-plus down the list to find his second-best effort. But he also has seven seasons of at least 10 GSDev, which is more than some very, very good pitchers managed (including Bob Feller, Dazzy Vance, Robin Roberts, and Tom Glavine, along with the obvious Koufax-style short-career cohort).

So how do we combine those numbers? The first option goes back to the concept of deviations and its quasi-basis in statistical analysis. When combining multiple deviations in a measurement, the proper approach is generally not to add them together; it is to square them, add the squares, and then take the square root of the sum. The effect of this option is to put a significant emphasis on very high scores. To go back to the Hipolito Pichardo vs. Kevin Appier comparison earlier, Appier’s 1993 GSDev was about 3 times higher than Pichardo’s. If you square them, that becomes a 9-to-1 ratio. Having a long career still matters, but a long career of unimpressive seasons will have a much harder time catching up to a pitcher with a few standout years.

(As a small corrective measure, negative values will be omitted from the sum of squares, since if you square a negative it becomes a positive and you’re therefore rewarding large negative scores. Yes, there are workarounds for this, but omitting the negatives entirely is the one I prefer, for reasons I have discussed previously in other contexts.)

Here is Morris from the perspective of adding the squares:

Year

Starts

Adj GS2

GSDev

SqGSDev

1977

6

52.4

2.86

8.2

1978

7

40.2

-0.88

0.0

1979

27

57.9

10.66

113.7

1980

36

51.4

7.10

50.4

1981

25

58.4

10.64

113.2

1982

37

50.4

6.65

44.2

1983

37

58.7

12.71

161.5

1984

38

54.3

9.24

85.4

1985

35

57.6

10.95

119.9

1986

35

57.7

10.86

117.9

1987

35

56.6

10.58

111.9

1988

34

51.5

6.09

37.1

1989

24

45.2

1.97

3.9

1990

36

49.0

5.15

26.5

1991

40

56.7

10.79

116.4

1992

38

51.6

6.97

48.5

1993

27

41.9

-0.14

0.0

1994

23

47.4

2.78

7.7

Tot/Avg

540

53.0

124.96

1166.4

The square root of that total rounds to 34.1. Which doesn’t mean a whole lot without anything to compare it to, except to note that if you take this seriously as a method of determining combined deviation, Morris’s 540-start career is about 43% farther from “pretty bad pitcher” than Pedro’s 29-start 2000 season.

For the second option, I’m bringing back an old friend: the peak-weighted sum. Rather than just adding up the raw GSDev values, you reduce the relative importance of each season as you get farther from the pitcher’s peak (season #2 is weighted at 95%, #3 at 90%, on down to #20 and following at 5%). Negatives are, again, removed. This obviously puts some emphasis on peak (as the name implies), but not quite to the same extent as the square-sum method. For example, consider a pitcher who has two seasons, the lesser of which has a GSDev 80% as high as the better. When adding the second-best year, the pitcher’s peak-weighted sum will increase by 76%; his sum-of-squares will go up by only 64%. The specifics of this calculation change depending on the length of the pitcher’s career, but it takes a long career indeed for sum-of-squares to shift in a less peak-heavy direction.

Here is Morris from the weighted sum perspective:

Year

Starts

Adj GS2

GSDev

Weight

WtDev

1977

6

52.4

2.86

0.35

1.00

1978

7

40.2

-0.88

0.15

0.00

1979

27

57.9

10.66

0.80

8.53

1980

36

51.4

7.10

0.60

4.26

1981

25

58.4

10.64

0.75

7.98

1982

37

50.4

6.65

0.50

3.32

1983

37

58.7

12.71

1.00

12.71

1984

38

54.3

9.24

0.65

6.01

1985

35

57.6

10.95

0.95

10.40

1986

35

57.7

10.86

0.90

9.77

1987

35

56.6

10.58

0.70

7.41

1988

34

51.5

6.09

0.45

2.74

1989

24

45.2

1.97

0.25

0.49

1990

36

49.0

5.15

0.40

2.06

1991

40

56.7

10.79

0.85

9.17

1992

38

51.6

6.97

0.55

3.83

1993

27

41.9

-0.14

0.20

0.00

1994

23

47.4

2.78

0.30

0.83

Tot/Avg

540

53.0

124.96

 

90.52

Without giving away his ordinal position in either approach, Morris is seven places higher in Weighted GSDev than he is in Sum-Squared GSDev. This is probably not a surprise, given his relatively flat, relatively long peak. When looking at the top 200 pitchers, the square root of the sum of the squares (or Root Sum Square, RSS for short) is, on average, about 39% of the weighted sum; Morris sits at 37.7%, which is consistent with weighted sum being his stronger category.

So which measure is better? Given the nature of my objections to the extremizing tendencies of the various pitching WAR systems, it may not surprise you to learn that I tend toward compromise. As such, my career measure of choice will be the average of (weighted sum * 0.39) and root-sum-square. Morris’s overall combined score is 34.73.

Where does that place him overall? Next time, we’ll start going through the 100 greatest starting pitching careers of the last twelve-plus decades, and we may just get an answer to that question.

No comments:

Post a Comment