So far in this series, we’ve reviewed the existing pitcher rating systems, introduced and refined Game Score as the basis for an alternative method, introduced deviations in Game Score as the structure of a new metric, and examined the single-season results of Game Score Deviations (GSDev), first laying out the scale and then running through the top 100 single-season scores from 1901-2022. Now, it’s time to start working on what most of you have probably been looking forward to all along: career evaluations.
The first thing I tried in forming a career ranking was simply adding up the single-season GSDev numbers and seeing who has the highest total. Appealing as the simple option is, in the case of GSDev it doesn’t really work. Unlike with WAR, the metric doesn’t function in an additive way (for instance, a player traded midseason will have a combined GSDev for the year that doesn’t equal the sum of his GSDev scores for each team), and mediocre seasons score too high in comparison to very good years to make straightforward addition a viable option. If you take, say, three of Hipolito Pichardo’s 1993 season (7-7, 4.00 ERA, 3.7 K/9 in 155.1 innings as a starter; 5.35 GSDev), you would exceed the GSDev total of his teammate Kevin Appier (18-8, 2.56 ERA, 2.90 FIP, 238.2 innings, 15.89 GSDev - one of the top 100 scores of all time). I suspect you could not have gotten Appier in trade for three Pichardos that year. As such, adding up raw GSDev scores gives the kind of results you might expect from a system that proportionately overvalues mediocre seasons, things like Don Sutton (with his long career and relatively low peak) on the edge of the top 10.
If the scaling doesn’t work when adding up the basic scores, what if we don’t use the raw numbers at all? Why not just use ordinal seasonal rankings? There’s some mileage to be gained here, particularly if you scale for league size; I’m particularly fond of designating pitchers as “ace” (top N pitchers of the year), “top-half ace” (N/2), or “top-quartile ace” (N/4), where N is the league size. Only six pitchers had at least 15 ace seasons: alphabetically, Grover Cleveland Alexander, Roger Clemens, Walter Johnson, Nolan Ryan, Tom Seaver, and Warren Spahn. Even fewer had at least 10 top-quartile ace years: Clemens, Lefty Grove, and Randy and Walter Johnson. Those are pretty reasonable lists of great pitchers.
But much as I
like toying with ordinal rankings, ultimately not all #1 seasons are the same,
nor are all #5 or #10 or #N/2 efforts. Going back to the top 100 seasons list,
there were 28 years from 1901-2022 that included multiple top-100 efforts, making up over 60% of the list, while 56 different seasons didn’t
include a single top-100 entry. This could vary wildly even over a brief timespan; from
1966-71, MLB produced 2, 0, 2, 1, 0, and 2 top-100 seasons, respectively. This
remains true if you go further down the all-time seasons the list. While 1939 has two top-100 seasons, the preceding year produced no seasons in the top 500. Equating Bob Feller’s 1939 (16.92
GSDev) to Red Ruffing’s 1938 (12.20) just because they both led the
league is unsatisfying at best, and disingenuous at worst. (As a bonus note,
Ruffing’s 1939 score was also 12.20, coming in ahead of 1938 by a tiny fraction.
He finished seventh in MLB that season.)
So what are
we to do with the seasonal numbers if raw addition doesn’t work? I can think of
two options. (Well, I can think of a lot more than two, but I’m going to
introduce the two that I like the most.) But in order to introduce them, I
should probably pick a pitcher to use as an example. Let’s take someone inoffensive
– say, a pitcher who was often an ace (8 times) and a top-half ace (7), but
rarely made the top quartile (once, with a fourth-place finish). Someone with a fairly long career (between 500
and 600 starts), not a ton of regular season awards but some postseason
success. Someone who’s been retired for around 30 years, so people are reasonably likely
to remember him, but presumably any controversy associated with him has long
blown over. Jack Morris, come on down!
Here are
Morris’s GSDev totals and yearly ranks in their raw form:
|
Year |
Starts |
Adj GS2 |
GSDev |
Rnk |
|
1977 |
6 |
52.4 |
2.86 |
100 |
|
1978 |
7 |
40.2 |
-0.88 |
207 |
|
1979 |
27 |
57.9 |
10.66 |
9 |
|
1980 |
36 |
51.4 |
7.10 |
42 |
|
1981 |
25 |
58.4 |
10.64 |
8 |
|
1982 |
37 |
50.4 |
6.65 |
46 |
|
1983 |
37 |
58.7 |
12.71 |
4 |
|
1984 |
38 |
54.3 |
9.24 |
18 |
|
1985 |
35 |
57.6 |
10.95 |
11 |
|
1986 |
35 |
57.7 |
10.86 |
10 |
|
1987 |
35 |
56.6 |
10.58 |
13 |
|
1988 |
34 |
51.5 |
6.09 |
53 |
|
1989 |
24 |
45.2 |
1.97 |
117 |
|
1990 |
36 |
49.0 |
5.15 |
67 |
|
1991 |
40 |
56.7 |
10.79 |
10 |
|
1992 |
38 |
51.6 |
6.97 |
42 |
|
1993 |
27 |
41.9 |
-0.14 |
220 |
|
1994 |
23 |
47.4 |
2.78 |
93 |
|
Tot/Avg |
540 |
53.0 |
124.96 |
|
Morris was often very good, but rarely great; his best season (1983) barely
cracks the top 400 all-time (to be specific, it’s #396), and you have to go
another 400-plus down the list to find his second-best effort. But he also has
seven seasons of at least 10 GSDev, which is more than some very, very good pitchers managed
(including Bob Feller, Dazzy Vance, Robin Roberts, and Tom Glavine, along with
the obvious Koufax-style short-career cohort).
So how do we combine those numbers? The first option goes back to the concept of deviations and its quasi-basis in statistical analysis. When combining multiple deviations in a measurement, the proper approach is generally not to add them together; it is to square them, add the squares, and then take the square root of the sum. The effect of this option is to put a significant emphasis on very high scores. To go back to the Hipolito Pichardo vs. Kevin Appier comparison earlier, Appier’s 1993 GSDev was about 3 times higher than Pichardo’s. If you square them, that becomes a 9-to-1 ratio. Having a long career still matters, but a long career of unimpressive seasons will have a much harder time catching up to a pitcher with a few standout years.
(As a small corrective measure, negative values will be omitted from the sum of squares, since if you square a negative it becomes a positive and you’re therefore rewarding large negative scores. Yes, there are workarounds for this, but omitting the negatives entirely is the one I prefer, for reasons I have discussed previously in other contexts.)
Here is
Morris from the perspective of adding the squares:
|
Year |
Starts |
Adj GS2 |
GSDev |
SqGSDev |
|
1977 |
6 |
52.4 |
2.86 |
8.2 |
|
1978 |
7 |
40.2 |
-0.88 |
0.0 |
|
1979 |
27 |
57.9 |
10.66 |
113.7 |
|
1980 |
36 |
51.4 |
7.10 |
50.4 |
|
1981 |
25 |
58.4 |
10.64 |
113.2 |
|
1982 |
37 |
50.4 |
6.65 |
44.2 |
|
1983 |
37 |
58.7 |
12.71 |
161.5 |
|
1984 |
38 |
54.3 |
9.24 |
85.4 |
|
1985 |
35 |
57.6 |
10.95 |
119.9 |
|
1986 |
35 |
57.7 |
10.86 |
117.9 |
|
1987 |
35 |
56.6 |
10.58 |
111.9 |
|
1988 |
34 |
51.5 |
6.09 |
37.1 |
|
1989 |
24 |
45.2 |
1.97 |
3.9 |
|
1990 |
36 |
49.0 |
5.15 |
26.5 |
|
1991 |
40 |
56.7 |
10.79 |
116.4 |
|
1992 |
38 |
51.6 |
6.97 |
48.5 |
|
1993 |
27 |
41.9 |
-0.14 |
0.0 |
|
1994 |
23 |
47.4 |
2.78 |
7.7 |
|
Tot/Avg |
540 |
53.0 |
124.96 |
1166.4 |
The square
root of that total rounds to 34.1. Which doesn’t mean a whole lot without
anything to compare it to, except to note that if you take this seriously as a
method of determining combined deviation, Morris’s 540-start career is about
43% farther from “pretty bad pitcher” than Pedro’s 29-start 2000 season.
For the second option, I’m bringing back an old friend: the peak-weighted sum. Rather than just adding up the raw GSDev values, you reduce the relative importance of each season as you get farther from the pitcher’s peak (season #2 is weighted at 95%, #3 at 90%, on down to #20 and following at 5%). Negatives are, again, removed. This obviously puts some emphasis on peak (as the name implies), but not quite to the same extent as the square-sum method. For example, consider a pitcher who has two seasons, the lesser of which has a GSDev 80% as high as the better. When adding the second-best year, the pitcher’s peak-weighted sum will increase by 76%; his sum-of-squares will go up by only 64%. The specifics of this calculation change depending on the length of the pitcher’s career, but it takes a long career indeed for sum-of-squares to shift in a less peak-heavy direction.
Here is Morris from the weighted sum perspective:
|
Year |
Starts |
Adj GS2 |
GSDev |
Weight |
WtDev |
|
1977 |
6 |
52.4 |
2.86 |
0.35 |
1.00 |
|
1978 |
7 |
40.2 |
-0.88 |
0.15 |
0.00 |
|
1979 |
27 |
57.9 |
10.66 |
0.80 |
8.53 |
|
1980 |
36 |
51.4 |
7.10 |
0.60 |
4.26 |
|
1981 |
25 |
58.4 |
10.64 |
0.75 |
7.98 |
|
1982 |
37 |
50.4 |
6.65 |
0.50 |
3.32 |
|
1983 |
37 |
58.7 |
12.71 |
1.00 |
12.71 |
|
1984 |
38 |
54.3 |
9.24 |
0.65 |
6.01 |
|
1985 |
35 |
57.6 |
10.95 |
0.95 |
10.40 |
|
1986 |
35 |
57.7 |
10.86 |
0.90 |
9.77 |
|
1987 |
35 |
56.6 |
10.58 |
0.70 |
7.41 |
|
1988 |
34 |
51.5 |
6.09 |
0.45 |
2.74 |
|
1989 |
24 |
45.2 |
1.97 |
0.25 |
0.49 |
|
1990 |
36 |
49.0 |
5.15 |
0.40 |
2.06 |
|
1991 |
40 |
56.7 |
10.79 |
0.85 |
9.17 |
|
1992 |
38 |
51.6 |
6.97 |
0.55 |
3.83 |
|
1993 |
27 |
41.9 |
-0.14 |
0.20 |
0.00 |
|
1994 |
23 |
47.4 |
2.78 |
0.30 |
0.83 |
|
Tot/Avg |
540 |
53.0 |
124.96 |
|
90.52 |
Without
giving away his ordinal position in either approach, Morris is seven places higher in
Weighted GSDev than he is in Sum-Squared GSDev. This is probably not a surprise,
given his relatively flat, relatively long peak. When looking at the top 200
pitchers, the square root of the sum of the squares (or Root Sum Square, RSS
for short) is, on average, about 39% of the weighted sum; Morris sits at 37.7%, which is consistent with weighted sum being his stronger category.
So which
measure is better? Given the nature of my objections to the extremizing
tendencies of the various pitching WAR systems, it may not surprise you to
learn that I tend toward compromise. As such, my career measure of choice will be the average of (weighted sum * 0.39) and root-sum-square. Morris’s overall combined
score is 34.73.
Where does
that place him overall? Next time, we’ll start going through the 100 greatest starting pitching careers of the last twelve-plus decades, and we may just get an answer
to that question.
No comments:
Post a Comment