Last time, we moved on from discussing the best pitching careers and began the process of using the GSDev system to evaluate the best starting pitcher of each season from 1901-2022, comparing the results to the readily available bWAR and fWAR systems. Specifically, we examined the cases in which at least two of the three methods agreed on the #1 ranking. This time, we’ll be going the other direction, looking at the years in which the three measurements agree on absolutely nothing.
We’ll take
these 21 seasons in groups; fortunately for me, they happen to have arrived in convenient
chronological clumps. We’ll start with the deadball years. Pitcher names are
followed by their ranking in the other two measures (in the same order as the
columns, GSDev then bWAR then fWAR):
|
Year |
GSDev Choice |
bWAR Choice |
fWAR Choice |
|
1903 |
Christy Mathewson (2,
2) |
Joe McGinnity (2, 6) |
Rube Waddell (3, 5) |
|
1905 |
Christy Mathewson (2,
2) |
Irv Young (5, 5) |
Cy Young (3, 5) |
|
1906 |
Mordecai Brown (3, 5) |
Vic Willis (10, 7) |
Al Orth (5, 2) |
|
1910 |
Ed Walsh (2, 2) |
Russ Ford (3, 4) |
Walter Johnson (2, 3) |
Three of the
four GSDev choices finished second in both of the other stats, which is a pretty impressive result. The other year (1906) is worth looking at a bit more. Mordecai Brown actually led
the NL in both ERA and FIP in 1906, but gets dinged in bWAR for
playing in front of one of the greatest defensive teams ever assembled, and in
both WAR systems for a comparatively low innings total of 277.1 (the WAR selections, Al Orth and Vic Willis, both comfortably cleared 300). Brown partly narrowed the innings gap in the
World Series, but neither WAR system cares about that (and he finishes #1 in GSDev without the Series starts anyway).
I am rather surprised to see Willis’s 1906 grade out so low in GSDev, but he comes out at the bottom of a pretty tight grouping. His score of 9.09 ranked #10; #7 posted a 9.20, and #4 had a 10.04. 1906 was frankly a strange year overall, with the two best pitchers in baseball around this time (Cy Young and Christy Mathewson) both having unusually lousy seasons. Our three metrics may not agree on who the best pitcher was, but they are unanimous in their agreement that the MLB leader in 1906 produced the lowest league-leading score of the first 20 years of the AL-NL era.
After four
disagreements in the first ten years of the sample, the three systems produced
at least some level of harmony in each of the next sixteen seasons. That brings
us to our next group of three-way arguments, which took place in the
still-segregated portion of the live-ball era:
|
Year |
GSDev Choice |
bWAR Choice |
fWAR Choice |
|
1927 |
Dazzy Vance (2, 3) |
Tommy Thomas (3, 28) |
Willie Foster (NA, 3) |
|
1929 |
Firpo Marberry (3, 10) |
Willis Hudlin (7, 6) |
Chet Brewer (NA, 6) |
|
1936 |
Carl Hubbell (2, 5) |
Lefty Grove (2, 4) |
Van Mungo (3, 6) |
|
1938 |
Red Ruffing (4, 5) |
Bill Lee (2, 6) |
Lefty Gomez (3, 12) |
That is quite
a bit more chaotic than the last table. The unfortunate lack of Negro League
data for GSDev rears its head again in the cases of Willie Foster and Chet Brewer, though if you’re curious, fWAR ranked
Lefty Grove second in both 1927 and 1929, so there’s no hope for consensus in
those years regardless of Negro League inclusion.
GSDev’s most contentious choice in this period is Firpo Marberry in 1929. Marberry is actually best known as the first bullpen ace in baseball history, so it should surprise nobody that his ’29 season included 23 relief appearances along with his 26 starts. Those additional outings would usually boost his WAR in comparison to his GSDev scores. In this case, however, he pitched much better as a starter, with a 2.86 ERA as compared to a 4.40 mark in his bullpen efforts (and corresponding differences in his hit and walk rates). Even if you penalize Marberry for the shaky extra outings, the systems won’t agree; GSDev #2 Red Lucas didn’t pitch in relief at all, so his score is fixed.
Two other #1
choices had double-digit rankings in an alternate system in this stretch. Lefty Gomez
in 1938 gets a bit of an asterisk, as fWAR technically had him tied for first
with Paul Derringer (who ranked #4 and #3, respectively); the real story here
is that fWAR just wasn’t impressed with anyone at all in ’38, as Gomez and
Derringer’s MLB-leading figures were a mere 5.0. And that leaves Tommy Thomas’s
1927, with bWAR evaluating him as baseball’s best pitcher while fWAR thinks he
wasn’t even an ace. If you compare Thomas to
Lefty Grove, fWAR sees Grove posting an FIP a full run lower in a tougher park, which is enough to easily overcome a bit of an innings deficit. However, the #28 finish
is still a surprise for a high-volume, fairly effective pitcher. Looking at
Thomas’s Fangraphs player page, it appears there are several seasons in which
his FIP numbers have significant discrepancies compared to the ones displayed
on Baseball Reference (while some of the other seasons have exact matches). I’m
not sure what the source of that issue is, but given that bWAR and GSDev are
similarly optimistic on Thomas’s efforts, I suspect there’s something funky
going on with fWAR here.
Our last major gap
between disputed years lasted from 1910 to 1927. This one is even longer; the
three systems managed to partially pacify themselves until expansion was on the horizon:
|
Year |
GSDev Choice |
bWAR Choice |
fWAR Choice |
|
1960 |
Don Drysdale (2, 3) |
Ernie Broglio (4, 12) |
Bob Friend (3, 6) |
|
1961 |
Whitey Ford (24, 4) |
Don Cardwell (17, 12) |
Jim Bunning (5, 22) |
|
1962 |
Bob Gibson (6, 4) |
Hank Aguirre (4, 9) |
Camilo Pascual (5, 11) |
After 20-plus
years of peace, the early ‘60s provide three consecutive seasons of complete
disharmony. This is particularly highlighted by 1961, a year which escalates
the petty squabbles of other seasons to near-blood feud levels.
As has been
the case in several other disputed years, 1961 was a campaign that produced few
standout starters. Whitey Ford’s GSDev score was the fourth-lowest #1 score in the
122 years of data we have (with one of the three lower winning scores coming in
1929, another year we’ve already seen in this group). And even that relatively
unimpressive chart-topper was driven in part by 14 scoreless innings over two
World Series starts, without which he’d have finished in third and Camilo
Pascual (#5 in bWAR, #6 in fWAR) would have claimed the top spot.
But that
doesn’t explain bWAR’s intransigence. GSDev and fWAR are comparatively cordial
in 1961, both placing the other’s selection reasonably high on the list.
Contrariwise, bWAR boots both Ford and Jim Bunning out of its top 20, and opts
instead for Don Cardwell, who places outside the top 10 in each of the other
systems. As you might expect, this comes entirely down to the fielding
adjustment. Cardwell’s Cubs apparently cost him nearly half a run per 9
innings, while Ford’s Yankees were better than average by a similar margin and
Bunning’s Tigers were nearly as good. And yet, Cardwell’s BABIP allowed was
.271, lower than the MLB average, and only six points higher than Ford’s .265
(Bunning’s was .259). FIP tells a similar tale; Cardwell’s 3.61 was an
improvement on his ERA (3.82), but still worse than Bunning’s 3.23 or Ford’s
3.14 (or Pascual’s 3.39, or Sandy Koufax’s MLB-leading 3.00). So yeah, I share in the skepticism of
bWAR’s choice here; 1961 was close enough to be anyone’s year, but “anyone”
probably still shouldn’t be someone who was merely #10 in the NL in FIP, and outside
the top 10 in ERA, WHIP, and basically every other rate category.
As a side
note, Bunning’s #1 ranking in fWAR in 1961 was actually a tie with Koufax, who
was the only pitcher to rank in the top 5 in all three systems that year. So
the dispute wouldn’t look quite as fervent if Koufax was first in alphabetical
order (which, best I can tell, is how both B-R and Fangraphs sort tied
pitchers).
1962 was a
year in which I expected to see more discord, as bWAR opted for a pitcher who
made 22 starts and 20 relief appearances. But those 22 starts helped Aguirre to
an MLB-leading ERA and an AL-leading FIP, and he was generally a bit better as
a starter than as a reliever, so GSDev is impressed despite accounting for just
under 80% of his work, and fWAR’s complaints are not as vociferous as they
might have been.
Our next jump
forward is short enough that I was tempted to combine the early ‘60s group with
the next one, but this table is going to be big enough as it is, and the relatively
short gap does cover some notable changes in the league, including the additions of four new teams and a round of the playoffs, plus a reduction in mound height. On we go to the middle of the expansion era:
|
Year |
GSDev Choice |
bWAR Choice |
fWAR Choice |
|
1971 |
Tom Seaver (2, 2) |
Wilbur Wood (4, 4) |
Fergie Jenkins (3, 3) |
|
1974 |
Gaylord Perry (2, 9) |
Jon Matlack (3, 3) |
Bert Blyleven (6, 4) |
|
1976 |
Frank Tanana (3, 4) |
Mark Fidrych (4, 12) |
Vida Blue (2, 2) |
|
1977 |
Tom Seaver (4, 5) |
Rick Reuschel (10, 4) |
Dennis Leonard (3, 13) |
|
1981 |
Fernando Valenzuela
(3, 2) |
Bert Blyleven (7, 6) |
Steve Carlton (2, 2) |
|
1982 |
Mario Soto (3, 2) |
Steve Rogers (2, 3) |
Steve Carlton (3, 7) |
|
1983 |
Mario Soto (3, 13) |
John Denny (7, 3) |
Steve Carlton (6, 5) |
Wow, does
fWAR love Steve Carlton in the early ‘80s. (Everyone else does too, but fWAR
loves him the most.) It’s worth observing here that bWAR’s #1 spot in 1982 was
a tie between Steve Rogers and Dave Stieb (who was #7 in GSDev and #10 in fWAR).
Overall, this
group is much calmer than the last one. The main points of contention are Mark
Fidrych in ’76 (who famously posted an MLB-best 2.34 ERA despite striking out
less than 4 batters per 9 innings, a very low rate even for the time), Dennis
Leonard in ’77 (whose Royals apparently had a good defense despite Leonard
allowing 18 unearned runs and having an FIP solidly lower than his ERA), and
Mario Soto in ’83 (whose FIP jumped by nearly a run from the year before
while his ERA and hit rate both went down).
And from
there, we get to another drought, followed by our final grouping:
|
Year |
GSDev Choice |
bWAR Choice |
fWAR Choice |
|
2003 |
Jason Schmidt (6, 5) |
Roy Halladay (4, 3) |
Mark Prior (3, 4) |
|
2007 |
Josh Beckett (2, 4) |
Roy Oswalt (20, 11) |
Jake Peavy (3, 7) |
|
2008 |
Roy Halladay (7, 4) |
Tim Lincecum (2, 2) |
CC Sabathia (4, 5) |
Roy Oswalt
was a terrific pitcher; the GSDev career rankings see him as a borderline Hall
of Fame candidate. But in 2007, he had the highest WHIP and lowest K rate of
the first decade of his career, the highest walk rate of any season until his
last, and his lowest K/BB ratio ever. Houston’s defense that year was terrible (according to bWAR), despite having been good in the
surrounding seasons, and despite Oswalt’s ERA being much better than his FIP
(the team as a whole was almost exactly even in that regard, 4.70 ERA to 4.73
FIP). So as good as Oswalt is generally, I’m not buying this particular year as
best-in-baseball material.
2008 is at
least a mildly interesting year for one particular reason: it is one of two
seasons in which the #1 ranking in GSDev changed hands due to the addition of
wild pitches to the Game Score calculation. Roy Halladay and Tim Lincecum had a close
race; Halladay threw only 4 wild pitches while Lincecum led the majors with 17.
That race was also very nearly decided by the postseason, as #4 Sabathia got
blown up in his lone playoff start; he would have finished #2 ahead of Lincecum otherwise (and a good October start would have pushed him to #1).
This is our last group of discordant seasons; since 2008, at least two of the three
systems have agreed every year (with one near-exception in 2016 when Clayton
Kershaw led in GSDev and managed a tie for first in fWAR). For what it’s worth, this
appears very likely to still be the case through 2025, as the provisional
leaders in GSDev in ’23, ’24 and ’25 all led in one WAR or the other (though
not both; bWAR and fWAR haven’t agreed on a full-length season leader since 2013).
The numbers overall point to a trend similar to what was shown last time: GSDev is more agreeable with each WAR system than they are with each other. Here is a table of median rankings of unanimously disagreed-on champions in alternate systems:
|
|
Dev Leader |
bWAR Leader |
fWAR Leader |
|
Dev Rank |
|
4 |
3 |
|
bWAR Rank |
3 |
|
5 |
|
fWAR Rank |
4 |
6 |
|
The GSDev-associated rankings in both directions clustered around 3 and 4, while the WAR systems placed each other’s leaders a couple spots lower. Averages are trickier to calculate, given the Negro League seasons involved, but even factoring those in, bWAR and fWAR are still more hostile to each other than they are to GSDev. Those differences also persist regardless of which pitcher you select from the tied seasons mentioned earlier in the post (’38 and ’61 in fWAR, ’82 in bWAR). Read into that what you will, but I’m comfortable interpreting it as GSDev taking fewer outlandish positions in seasonal rankings than its counterparts.
With that, we bring this series of posts to a close, at least for now. I’ll plan to revisit the GSDev rankings when 1898-1900 numbers become available (hopefully later this offseason), and may touch on some other side topics at that time. But for now, we’ve reached a natural stopping point. If time permits, my next post(s) will probably be an update to the positional Weighted WAR rankings covering changes due to the 2025 season.
No comments:
Post a Comment