Monday, October 6, 2025

Starting Pitcher Ratings: Adjusted Game Score

So far in this series, we’ve discussed why an alternative to pitching WAR might be beneficial, and introduced Game Score as the basis on which an alternative might be built. Now, let’s put Game Score through a few basic adjustments so we can make it more viable for historical comparisons.

We’ve seen already how Game Score takes a variety of results (innings, hits, runs, strikeouts, etc.) and combines them into a single number reflecting the effectiveness of a pitcher’s performance in a particular start. But there are still a few obvious factors in the pitcher’s results that have yet to be accounted for: what team was he facing, and when and where did the game take place?

The approach here has two steps. First, find the expected runs per game by the opponent in the park and year in question. To do this, start with the opponent’s raw scoring rate in runs per game, then adjust for their overall park factor (including both home and road games) to get their normalized scoring rate, then multiply that by the park factor of the stadium that hosted the game in question.

If the gritty details of park adjustments don’t interest you, skip past the formulas that are a couple paragraphs down. For anyone still reading, I'm using a regressed three-year park factor, depending on how many of the surrounding seasons the team spent in the same park. (Yes, it’s probably more rigorous to use more seasons of park data, but that also adds more complications when teams move to a new park and would delay finalization of seasonal results even further. As it is, I already had to wait for the 2025 regular season to end so I could run preliminary rankings for 2024.)

The formulas, to be shown below, apply to both overall park factor and home park factor. APF is the adjusted park factor, PF(0) is the raw park factor for the season under examination, and PF (-1) and PF (1) are for the previous and next season, respectively. The formulas are:

One year in park: APF = 0.67*PF(0) + 0.33

Two years: APF = 0.5*PF(0) + 0.25*PF(±1) + 0.25

Three years: APF = 0.4*PF(0) + 0.21*(PF(1) + PF(-1)) + 0.18

Park adjusted opponent scoring can have a pretty wide range even within a single season. If you compare the figures across history, the gap is often prodigious. The 1999 Giants have an enormous expected output of 8.36 runs per game in Coors Field, while the 1968 Mets in Dodger Stadium would anticipate an anemic 2.47. (Just for fun – the ’99 Giants actually scored 58 runs in their 6 games in Coors, averaging 9.67; the ’68 Mets put up 21 in 9 games in Dodger Stadium, or 2.33 per.)

None of the above is original; park and league context adjustments are one of the first topics of sabermetric study. The question for our purposes is, how do you apply this to Game Score, which is not laid out in run-based units?

This leads us to step 2: develop a Game Score adjustment based on scoring environment. Early in my pitching-based work, I decided to use a single formula across history for this purpose rather than modifying it year-to-year. This approach has benefits and drawbacks. One of the primary benefits at the time was making the numbers easier to work with, but it’s also nice to have a fixed formula that can capture changes in average starting pitcher production over time. The downside is a fairly marginal loss in accuracy in the adjustment, which would primarily show up in extremes that most pitchers won’t see very often (even the 1968 Dodgers don’t face the Mets at home in every start, after all).

To generate this adjustment, I pulled the GS2 numbers for three seasons: 1965, 1978, and 1994. This gave a wide range of scoring environments, but kept the league context reasonably modern without being influenced by the wonkier tendencies of current-day pitching (such as the opener). I ran a regression of GS2 against total runs scored in the game for each year, which gave the following results:

1965: GS2 = 75.7 – 2.79*R

1978: GS2 = 74.6 – 2.68*R

1994: GS2 = 72.6 – 2.46*R

Note that for any relatively normal scoring context (say, total runs per game between 7 and 11), there is less than one point of difference between the outcome of any of these formulas; for R=9 and R=10, the differences are 0.2 or less. For simplicity’s sake, I ended up using a rounded compromise formula:

Expected GS2 = 75 – 2.7*R

If you want the average to end up as 50, you adjust the pitcher’s actual Game Score by adding the difference between 50 and this value. Also, since R was expressed in total runs per game (rather than runs per game for the individual team), we double its coefficient to account for park-adjusted opponent scoring:

GS2 Adjustment = 5.4*(PAOS) – 25

For a pitcher facing the aforementioned ’99 Giants in Colorado, this adjustment is +20.2 points of Game Score; for the ’68 Mets visiting LA, it’s -11.7.

What do these adjustments look like over a full season? We’ll stick with ’68 and ’99 as extremes in each direction. In 1968, the pitcher with the most hitter-friendly set of opponents and environments (with at least 25 starts) was Joe Niekro of the Cubs, whose Game Scores are adjusted an average of -4.7 points; on the pitcher-friendly side, it’s LA’s Don Drysdale with a -7.9. In 1999, the friendliest conditions were given to San Francisco’s Shawn Estes with a -0.1, while the most hostile were unsurprisingly suffered by the poor souls condemned to Colorado, with Brian Bohanon taking the crown at +8.8. Note the complete lack of overlap between the two seasons; in fact, there’s less distance between the extremes in 1968 than there is between the highest ’68 adjustment and the lowest ’99 adjustment.

How does the formula fare over time? Average opponent-park adjusted GS2 (OPAGS2) by decade:

Years

OPAGS2

1901-09

56.1

1910-19

54.0

1920-29

52.1

1930-39

51.6

1940-49

50.8

1950-59

49.6

1960-69

50.2

1970-79

50.0

1980-89

49.4

1990-99

49.6

2000-09

49.0

2010-19

49.4

2020-24

48.9

That stabilizes reasonably quickly; once the early modern game sets in (integration and expansion), the average only varies by a point or so. There’s still an obvious difference in the earliest part of our dataset, but that’s understandable – and ultimately we’ll be adjusting for league context on a yearly basis, so the final numbers won’t be polluted by this change.

Speaking of polluting the numbers, one additional note I should make before talking results: I am including postseason starts in the consideration set. Yes, this is different from my approach in the weighted WAR system, but in that case I was constrained by what Baseball Reference factors into its WAR calculations. Here, I’m building my own system and can use whatever data I choose. And I choose to include playoff starts for two reasons. First, the postseason is considered MORE important than the regular season by the teams, players, and fans. For pitchers in particular, good or bad postseason performance can have an enormous impact on how the player is regarded. (Ask Madison Bumgarner... or Clayton Kershaw.) Second, postseason participation often costs pitchers regular season starts, either in the same year (as the team manages workload and sets the playoff rotation) or in the future (due to wear and tear, and sometimes due to immediate injury). Counting postseason stats does raise some issues of fairness to pitchers on bad teams, but it strikes me as being a more reasonable option than ignoring them.

All right, you’ve stuck with me through three posts (and a few dozen repetitions of the word “adjustment”), so it’s time for our first actual fun table of numbers. Here are the top 50 single seasons in average adjusted Game Score (25-start minimum):

Rank

Year

Pitcher

Starts

OPAGS2

1

2000

Pedro Martinez

29

79.1

2

1999

Pedro Martinez

31

76.2

3

1913

Walter Johnson

36

74.9

4

1910

Ed Walsh

36

74.3

5

1912

Walter Johnson

37

74.0

6

1994

Greg Maddux

25

73.5

7

1918

Walter Johnson

29

73.4

8

1901

Cy Young

41

73.0

9

1997

Roger Clemens

34

72.9

10

1968

Bob Gibson

37

72.6

11

1905

Ed Reulbach

29

72.3

12

1997

Pedro Martinez

31

72.2

13

1931

Lefty Grove

33

72.1

14

1995

Greg Maddux

33

71.9

15

1924

Dazzy Vance

34

71.8

16

1915

Pete Alexander

44

71.6

17

1910

Russ Ford

33

71.5

18

1909

Mordecai Brown

34

71.4

19

1919

Walter Johnson

29

71.3

20

1905

Christy Mathewson

40

71.3

21

2001

Randy Johnson

39

71.3

22

1902

Rube Waddell

27

71.0

23

1995

Randy Johnson

33

70.7

24

1912

Smoky Joe Wood

41

70.6

25

1946

Hal Newhouser

34

70.1

26

1928

Dazzy Vance

32

69.8

27

1908

Mordecai Brown

32

69.8

28

1985

Dwight Gooden

35

69.7

29

1910

Walter Johnson

42

69.7

30

1911

Ed Walsh

37

69.6

31

1965

Sandy Koufax

44

69.5

32

1908

Cy Young

33

69.5

33

1940

Bob Feller

37

69.4

34

1936

Lefty Grove

30

69.3

35

1915

Walter Johnson

39

69.3

36

1902

Cy Young

43

69.3

37

1932

Lefty Grove

30

69.2

38

1999

Randy Johnson

36

69.2

39

1972

Steve Carlton

41

69.2

40

1936

Carl Hubbell

36

69.1

41

1969

Bob Gibson

35

69.1

42

1914

Russ Ford

26

69.0

43

1971

Tom Seaver

35

68.9

44

1914

Walter Johnson

40

68.9

45

1986

Mike Scott

39

68.9

46

1911

Vean Gregg

26

68.8

47

1914

Claude Hendrix

37

68.8

48

1902

Bill Bernhard

25

68.7

49

1937

Lefty Gomez

36

68.6

50

1935

Cy Blanton

30

68.6

If I may say so, that’s a pretty fun list. You can definitely see the effects of the change in pitcher usage –Russ Ford makes as many appearances as Roger Clemens and Sandy Koufax combined. But I don’t mind letting the deadball pitchers have a moment in the sun; you can already infer from the table of league averages that their numbers are going to have some serious air let out moving forward.

Also, even with the deadball advantage on the table, the difference between Pedro Martinez’s highest average and anyone else’s highest average is the same as the gap between #3 and #23 on the list. Pedro, it turns out, was pretty good.

Pedro was not, however, especially durable; note that his exceptional 2000 season included only 29 starts, while several of the seasons behind him had totals in the high 30s and a few even cleared 40. Next time, we’ll take our context-adjusted Game Score and work on turning it into a more robust measure of seasonal performance, one that balances excellence and availability.