So far in this series, we’ve discussed why an alternative to pitching WAR might be beneficial, and introduced Game Score as the basis on which an alternative might be built. Now, let’s put Game Score through a few basic adjustments so we can make it more viable for historical comparisons.
We’ve seen
already how Game Score takes a variety of results (innings, hits, runs, strikeouts, etc.) and combines them into a
single number reflecting the effectiveness of a pitcher’s performance in a
particular start. But there are still a few obvious factors in the pitcher’s
results that have yet to be accounted for: what team was he facing, and when and where did the game take place?
The approach
here has two steps. First, find the expected runs per game by the opponent in
the park and year in question. To do this, start with the opponent’s raw scoring rate
in runs per game, then adjust for their overall park factor (including both
home and road games) to get their normalized scoring rate, then multiply that by the park factor of the stadium that
hosted the game in question.
If the gritty details of park adjustments don’t interest you, skip past the formulas that are a couple paragraphs down. For anyone still reading, I'm using a regressed three-year park factor, depending on how many of the surrounding seasons the team spent in the same park. (Yes, it’s probably more rigorous to use more seasons of park data, but that also adds more complications when teams move to a new park and would delay finalization of seasonal results even further. As it is, I already had to wait for the 2025 regular season to end so I could run preliminary rankings for 2024.)
The formulas, to be shown below, apply to both overall park factor and home park factor. APF is the adjusted park factor, PF(0) is the raw
park factor for the season under examination, and PF (-1) and PF (1) are for the
previous and next season, respectively. The formulas are:
One year in
park: APF = 0.67*PF(0) + 0.33
Two years: APF
= 0.5*PF(0) + 0.25*PF(±1) + 0.25
Three years:
APF = 0.4*PF(0) + 0.21*(PF(1) + PF(-1)) + 0.18
Park adjusted
opponent scoring can have a pretty wide range even within a single season. If
you compare the figures across history, the gap is often prodigious. The 1999
Giants have an enormous expected output of 8.36 runs per game in Coors Field, while the 1968 Mets in Dodger Stadium would anticipate an anemic 2.47. (Just for fun
– the ’99 Giants actually scored 58 runs in their 6 games in Coors, averaging
9.67; the ’68 Mets put up 21 in 9 games in Dodger Stadium, or 2.33 per.)
None of the above
is original; park and league context adjustments are one of the first topics
of sabermetric study. The question for our purposes is, how do you apply this
to Game Score, which is not laid out in run-based units?
This leads us
to step 2: develop a Game Score adjustment based on scoring environment. Early
in my pitching-based work, I decided to use a single formula across history for this purpose
rather than modifying it year-to-year. This approach has benefits and drawbacks. One of
the primary benefits at the time was making the numbers easier to work with,
but it’s also nice to have a fixed formula that can capture changes in average
starting pitcher production over time. The downside is a fairly marginal loss
in accuracy in the adjustment, which would primarily show up in extremes that
most pitchers won’t see very often (even the 1968 Dodgers don’t face the Mets
at home in every start, after all).
To generate
this adjustment, I pulled the GS2 numbers for three seasons: 1965, 1978, and
1994. This gave a wide range of scoring environments, but kept the league
context reasonably modern without being influenced by the wonkier tendencies of
current-day pitching (such as the opener). I ran a regression of GS2 against
total runs scored in the game for each year, which gave the following results:
1965: GS2 =
75.7 – 2.79*R
1978: GS2 =
74.6 – 2.68*R
1994: GS2 =
72.6 – 2.46*R
Note that for
any relatively normal scoring context (say, total runs per game between 7 and
11), there is less than one point of difference between the outcome of any of
these formulas; for R=9 and R=10, the differences are 0.2 or less. For
simplicity’s sake, I ended up using a rounded compromise formula:
Expected GS2 =
75 – 2.7*R
If you want the
average to end up as 50, you adjust the pitcher’s actual Game Score by adding
the difference between 50 and this value. Also, since R was expressed in total
runs per game (rather than runs per game for the individual team), we double
its coefficient to account for park-adjusted opponent scoring:
GS2 Adjustment
= 5.4*(PAOS) – 25
For a pitcher facing the
aforementioned ’99 Giants in Colorado, this adjustment is +20.2 points of Game
Score; for the ’68 Mets visiting LA, it’s -11.7.
What do these
adjustments look like over a full season? We’ll stick with ’68 and ’99 as
extremes in each direction. In 1968, the pitcher with the most hitter-friendly
set of opponents and environments (with at least 25 starts) was Joe Niekro of
the Cubs, whose Game Scores are adjusted an average of -4.7 points; on the
pitcher-friendly side, it’s LA’s Don Drysdale with a -7.9. In 1999, the
friendliest conditions were given to San Francisco’s Shawn Estes with a -0.1, while the most hostile were unsurprisingly suffered by the poor souls condemned to
Colorado, with Brian Bohanon taking the crown at +8.8. Note the complete lack
of overlap between the two seasons; in fact, there’s less distance between the extremes
in 1968 than there is between the highest ’68 adjustment and the lowest ’99
adjustment.
How does the
formula fare over time? Average opponent-park adjusted GS2 (OPAGS2) by
decade:
Years |
OPAGS2 |
1901-09 |
56.1 |
1910-19 |
54.0 |
1920-29 |
52.1 |
1930-39 |
51.6 |
1940-49 |
50.8 |
1950-59 |
49.6 |
1960-69 |
50.2 |
1970-79 |
50.0 |
1980-89 |
49.4 |
1990-99 |
49.6 |
2000-09 |
49.0 |
2010-19 |
49.4 |
2020-24 |
48.9 |
That stabilizes
reasonably quickly; once the early modern game sets in (integration and
expansion), the average only varies by a point or so. There’s still an obvious
difference in the earliest part of our dataset, but that’s understandable – and
ultimately we’ll be adjusting for league context on a yearly basis, so the final
numbers won’t be polluted by this change.
Speaking of
polluting the numbers, one additional note I should make before talking
results: I am including postseason starts in the consideration set. Yes, this
is different from my approach in the weighted WAR system, but in that case I
was constrained by what Baseball Reference factors into its WAR calculations.
Here, I’m building my own system and can use whatever data I choose. And I choose to include playoff starts for two reasons. First, the postseason is considered MORE important than the regular
season by the teams, players, and fans. For pitchers in particular, good or bad postseason performance can have an enormous impact on how the player is regarded. (Ask Madison Bumgarner... or Clayton Kershaw.) Second, postseason participation often costs pitchers regular season starts, either in the same year (as the team manages workload and sets the playoff rotation) or in the future (due to wear and tear, and sometimes due to immediate injury). Counting postseason stats does raise some issues of fairness to pitchers on bad teams, but it strikes me as being a more reasonable option than ignoring them.
All right,
you’ve stuck with me through three posts (and a few dozen repetitions of the
word “adjustment”), so it’s time for our first actual fun table of numbers. Here
are the top 50 single seasons in average adjusted Game Score (25-start
minimum):
Rank |
Year |
Pitcher |
Starts |
OPAGS2 |
1 |
2000 |
Pedro
Martinez |
29 |
79.1 |
2 |
1999 |
Pedro
Martinez |
31 |
76.2 |
3 |
1913 |
Walter
Johnson |
36 |
74.9 |
4 |
1910 |
Ed Walsh |
36 |
74.3 |
5 |
1912 |
Walter
Johnson |
37 |
74.0 |
6 |
1994 |
Greg
Maddux |
25 |
73.5 |
7 |
1918 |
Walter
Johnson |
29 |
73.4 |
8 |
1901 |
Cy Young |
41 |
73.0 |
9 |
1997 |
Roger
Clemens |
34 |
72.9 |
10 |
1968 |
Bob
Gibson |
37 |
72.6 |
11 |
1905 |
Ed
Reulbach |
29 |
72.3 |
12 |
1997 |
Pedro
Martinez |
31 |
72.2 |
13 |
1931 |
Lefty
Grove |
33 |
72.1 |
14 |
1995 |
Greg
Maddux |
33 |
71.9 |
15 |
1924 |
Dazzy
Vance |
34 |
71.8 |
16 |
1915 |
Pete
Alexander |
44 |
71.6 |
17 |
1910 |
Russ Ford |
33 |
71.5 |
18 |
1909 |
Mordecai
Brown |
34 |
71.4 |
19 |
1919 |
Walter
Johnson |
29 |
71.3 |
20 |
1905 |
Christy
Mathewson |
40 |
71.3 |
21 |
2001 |
Randy
Johnson |
39 |
71.3 |
22 |
1902 |
Rube
Waddell |
27 |
71.0 |
23 |
1995 |
Randy
Johnson |
33 |
70.7 |
24 |
1912 |
Smoky Joe
Wood |
41 |
70.6 |
25 |
1946 |
Hal
Newhouser |
34 |
70.1 |
26 |
1928 |
Dazzy
Vance |
32 |
69.8 |
27 |
1908 |
Mordecai
Brown |
32 |
69.8 |
28 |
1985 |
Dwight
Gooden |
35 |
69.7 |
29 |
1910 |
Walter
Johnson |
42 |
69.7 |
30 |
1911 |
Ed Walsh |
37 |
69.6 |
31 |
1965 |
Sandy
Koufax |
44 |
69.5 |
32 |
1908 |
Cy Young |
33 |
69.5 |
33 |
1940 |
Bob
Feller |
37 |
69.4 |
34 |
1936 |
Lefty
Grove |
30 |
69.3 |
35 |
1915 |
Walter
Johnson |
39 |
69.3 |
36 |
1902 |
Cy Young |
43 |
69.3 |
37 |
1932 |
Lefty
Grove |
30 |
69.2 |
38 |
1999 |
Randy
Johnson |
36 |
69.2 |
39 |
1972 |
Steve
Carlton |
41 |
69.2 |
40 |
1936 |
Carl
Hubbell |
36 |
69.1 |
41 |
1969 |
Bob
Gibson |
35 |
69.1 |
42 |
1914 |
Russ Ford |
26 |
69.0 |
43 |
1971 |
Tom
Seaver |
35 |
68.9 |
44 |
1914 |
Walter
Johnson |
40 |
68.9 |
45 |
1986 |
Mike
Scott |
39 |
68.9 |
46 |
1911 |
Vean
Gregg |
26 |
68.8 |
47 |
1914 |
Claude
Hendrix |
37 |
68.8 |
48 |
1902 |
Bill
Bernhard |
25 |
68.7 |
49 |
1937 |
Lefty
Gomez |
36 |
68.6 |
50 |
1935 |
Cy
Blanton |
30 |
68.6 |
If I may say
so, that’s a pretty fun list. You can definitely see the effects of the change
in pitcher usage –Russ Ford makes as many appearances as Roger Clemens and
Sandy Koufax combined. But I don’t mind letting the deadball pitchers have a
moment in the sun; you can already infer from the table of league averages that
their numbers are going to have some serious air let out moving forward.
Also, even with
the deadball advantage on the table, the difference between Pedro Martinez’s
highest average and anyone else’s highest average is the same as the gap
between #3 and #23 on the list. Pedro, it turns out, was pretty good.
Pedro was not,
however, especially durable; note that his exceptional 2000 season included
only 29 starts, while several of the seasons behind him had totals in the high
30s and a few even cleared 40. Next time, we’ll take our context-adjusted Game
Score and work on turning it into a more robust measure of seasonal
performance, one that balances excellence and availability.
No comments:
Post a Comment