Info on Blog

Monday, July 27, 2020

Baseball Prospectus Secret Sauce: Their Rationale for Ending It Was Incorrect

[NOTE: wrote most of this up before, just getting around to posting now]

Baseball Prospectus did a study for their great book, "Baseball Between the Numbers", of how teams historically, during the divisional baseball era, got deep into the playoffs, answering the question Billy Beane had, of why his sh!t didn't work in the playoffs.

They found, on an overall basis, that offensive stats was random and all over the place, meaning that it is defensively how teams got deep into the playoffs, that is, pitching and fielding.  They narrowed down the variables to three key metrics:  K/9 for the pitching staff, WRXL for their closer, and their Fielding Defense metric, FRAA.  The only offensive metric that seemed tied to going deep was stolen base attempts, which they attributed to the link of team speed to stolen base attempts.

For a few years, BP published analysis based on their "secret sauce".  But after a number of failures, they decided to end the analysis, making the assumption that the playoffs had changed in some way to make it fail as a strategy.

ogc thoughts

But I still believe in the results.  Those annual playoff articles were not done correctly.  Nate used the wrong methodology when he did this article and the others.  And nobody at BP caught this.

BP Ranked Playoff Teams Wrong

The problem was that he ranked the teams against each other in just that season, just those playoffs.  That's not how he ranked them in the original study, he ranked them among ALL the playoff teams before.  That would give a better idea of whether any team stands out or if it is a more random result.  For example, if all the teams ranked low historically in that particular season's playoffs, then no matter what, somebody has to win, and more on a random basis.

I believe that this is where many analyses and discussions of the playoffs goes off the rails:  because sometimes the teams in the playoffs weren't actually all that good, historically, but they made the playoffs, and someone eventually wins, no matter what, so it's like musical chairs then.  But BP's analysis found a common linkage among all the teams in divisional playoff history - defense - where teams that did well in some key metrics had that extra something that enables them to win it all, more regularly than other teams, who do not rank as well.

Correct Way Was to Compare Historically and By Probability

The correct way of producing this annual playoff article would be to show the probability of each team within the history of playoffs.  Like the way they did it in the chapter, they looked at the top 10 teams overall, and, from what I recall, 8 of 10 won the World Series, with the 9th team having the bad luck of facing another Top 10 team.  It goes both ways, if you have some outliers facing each other, someone must win, someone must lose, no matter how good or bad.  So if a modern team happen to fall into that Top 10, they would appear to be prohibitive favorites, especially if the rest of the teams ranked much lower historically.

So the two bits of information I would have provided for each playoff team would be their rank among all the baseball teams, and then a histogram of the probabilities for teams historically in that range that they happen to be in.  For example, maybe they ranked in bottom decile, and none of those historic teams ever won the World Series, or even made the World Series, but maybe a median ranker, X% made World Series, Y% advanced to LCS, X% only as far as the LDS, etc. 

It would not be definitive, the way Nate wrote this article, but it would have been more accurate (and ironiically, similar to his work with presidential elections, which dealt with odds differently), I would bet, once you see how teams ranked that high had performed historically.  Plus, with a probability given the grouping each team is in, then some Monte Carlo simulations could be run, see how often each team wins the World Series.  Everyone wants certainty, but like Silver does on 538 with elections and polling, he could do the same with playoff teams, only with odds of each winning, based on their historic probability.  Results are still random, but it would be clearer when a team was much better than other playoff teams, relative to history.

Because the probability has been that most of the teams in any particular playoff year are somewhere near the middle, and clustered relatively close to each other.  And if top the teams are pretty similar, who wins becomes more of a random event, a coin flip, which is why the "why doesn't Billy Beane's $h!t works in the playoffs" theory gets a lot of credence and support.  Reporting on the historic probabilities would expose when a team is significantly better than the others, plus gives a sense of the odds of them winning.  Again, some sort of Monte Carlo simulation would illustrate how likely any team has for winning the World Series.

Possible Improvements on Prior Study

And with today's data analytics, perhaps instead of ranking the history of playoff teams, advanced techniques could be used to create a cluster tree of expected results or group the rankings in a range, using machine learning.  Or perhaps find a new secret sauce of how teams make and win the World Series, by redoing the study with modern data analytics.

Though the old secret sauce seemed to work fine the year I tried it out.  For 2010, I tried to replicate what Nate had done in the chapter.  I went through their database, pulled the Top 10 teams noted in the book (they still had WRXL back then), the only ones ever identified, and their stats, and see how close the Giants were to the Top.  And while they were not in the Top 10, I recall them being pretty close.

Furthermore, their reasoning for declaring the findings to no longer be applicable was incorrect process as well.  The right way would have been to resurrect the study, include the latest data and verify that the findings are now something else, which means that things have changed.  But it is ironic for a well known sabermetric site to dismiss a finding, which was based on around 35 years of divisional playoffs, based on a small samples size of a few playoffs.  I'm no statistician, but I know enough that this is not the right way to dismiss the findings of this study.

Defense Wins Championships

And if you look at other sports, one finds defense again to be a common denominator for teams who go all the way to win.  Air Coryell couldn't ever figure out how to turn his great offenses into championships, but Bill Walsh used his offensive principles, tied it to George Seifert's defense, and built up his defensive talents, drafting Ronnie Lott, among others, and won many championships.  In basketball, scoring gets all the attention (much like homers and offense in baseball), but run-and-gun teams, like the ones the Warriors had under Don Nelson, could score a lot of points, but it was the superior defensive unit of the Death Lineup that got the Curry/Kerr Warriors their championships.

In addition, Fangraphs' The Hardball Times had a study on playoff success and, using a different methodology, came to the same conclusions:  defense is a core, offense is just hygiene or contextual.  In that study, the author took a look at a large number of team stats, covering offense and defense, and the initial results was randomness, but once he filtered out the randomness by eliminating the W/L results where one team wasn't that much better than the other, the results were very clear:  pitching and fielding superiority resulted in more wins in the playoffs, offensive superiority did not.  Making defense a core capability and offense simply hygiene.

As Moore wrote about these terms, and was quoted in this article:
Any corporate activity that increases shareholder value is core. Anything that doesn't is context. 
"a business process is core when its outcome directly affects the competitive advantage of the company in its targeted markets. Here is the ground upon which companies must differentiate to win power, and the goal of core work is to create and sustain that differentiation."   
From that line of reasoning, Moore reaches two conclusions: "For core activities, the goal is to differentiate as much as possible.... And the winning approach to context tasks is not to differentiate, but rather to execute them as effectively and efficiently in as standardized a manner as possible."    
This is not to say that contextual activities are ultimately of secondary importance. 
But Moore's point is that excellence in contextual activities won't differentiate your business and will guarantee neither a competitive advantage nor a nod from the shareholders. He refers to these as hygiene.  "Hygiene refers to all the things the marketplace expects you to do well," he writes, "but gives you no credit for doing exceptionally well."
So offense, found to have little correlation to winning in the playoffs in two major studies by BP and FG, does not affect the competitive advantage of the baseball team in the playoffs, whereas pitching and defense, both found by BP and FG to be strongly connected with winning in the playoffs, are core activities.

Thus, teams that want to maximize their chances in the playoffs, and especially to increase their chances of going deep into the playoffs and getting to the World Series should be focused on pitching, pitching, and more pitching.  Pitching might be fragile, it might be here today, gone tomorrow, it might be hard to develop, because either you have it or you don't, but any team that is not focusing on pitching superiority as a core competency, as well as fielding, is not focused on winning in the playoffs.  

Analytical Teams Have Not Been Pitching Focused

Hence why I find that analytical teams goes off the rails: they are not focused on pitching as a differentiator.  They treat pitching as a nice to have, not a must have.  They treat pitching as an efficiency exercise, not an effectiveness exercise.

Thus, you see analytical teams focus their best draft bullets (first rounders) on college hitters, because that's the surest bet of hitting on a good player. They probably think that they can then trade for what they need. If there's one thing I've learned from following baseball all these years, it's that it's better to have the core things you need beforehand, because when it come time to trade for it, you are at the mercy of whatever is being offered, and supply of good pitching is usually lacking.  

Besides, if you are analytical, you know that major studies by Matt Swartz at BP and FG, using their respective databases, found that teams generally know who their good players are (i.e. who to keep) and who are not so good (i.e. who to trade), and thus the implication that if your team is relying on trades to improve your team, you are mostly likely trading away a good player (because that's what you have, from drafting) for a player that the other team knows to have flaws that make that player not good enough to keep, meaning that analytical teams end up eating a discount in trades, because that's how the market works in the MLB.

Thus, teams that have a focus on pitching as a differentiator should be overloading on pitching in the draft, using more of their first round picks, and more of their picks overall, on pitching, much like the Giants did under Sabean/Tidrow.  They would also take on what others would call additional risks, to get pitchers that they liked, like they did with Lincecum (size) and Bumgarner (arm action) and Wilson (TJS).  Many will not pay off, but when they do, you were swinging for the fences, not trying to hit the sure single.

Another sign of efficiency over effectiveness is in how the analytical teams handle their starting pitchers.  My prime example is how the Dodgers handle Rich Hill.   They basically take him out after 2 times through the lineup (roughly 18-20 batters), even if he's really doing well against the other team that day.   

A true analytical team would not stop there, this is just as asinine a policy as BP's recommendation to stop all pitchers at 100 pitches.  In this case, they should have analyzed his starts by how long his effectiveness lasts when he's pitching well, and break them down by pitching quality, whether PQS or Game Score, or  whatever metric they use internally.  Just because most pitchers lose effectiveness after 2 times through the lineup does not mean that they always lose effectiveness.  Figure out when he's golden or when he's shaky and should be taken out sooner.

There are other factors to consider as well.  They should also look at his history against the team, maybe he gets the top of the lineup hitters out historically, so keep him in longer.  Furthermore, he's a pro, so give him some rope to get more outs and keep the bullpen arms fresher, then pull him once there is danger, and once he lets a runner on, reconsider, given the score (close game or huge lead).  For example, if there's a big lead, you might be able to get another inning or two out of him.  

We Giants fans lived through such a managerial decision seeing Russ Ortiz taken out in the 2002 World Series.  It didn't make sense given that he was cruising through the lineup.  It made even less sense once we learned that Robb Nen's shoulder was basically shot to hell, and we had one less sturdy reliever to come in and shut things down if relievers fail before him.  

Analytic teams seem to treat pitchers as mechanical robots that you can send out there and get a great ERA out of them.  Each game is different, each day is different.  Every time a pitching change is made, the manager is taking a risk that the replacement pitcher is worse than the prior pitcher.  If the pitcher is good, the odds are pretty good you made the right choice, but then you have to be prepared for the possibility that if he isn't, and how the dominos fall should you have a series of failures.  

So these analytical teams seem to be making this decision for effectiveness, but I would argue that they are mistaking this move as effective, because they are leaving potential outs in Hill's arm and taking on additional risks by bringing in their relievers that early.  And increased risk means increased chance that another run scores, which is less effective.  They have not been weighing that difference properly, the way I see it.  And that costs these teams in the playoffs, which is why their $h!t does not work as well in the playoffs, they are built to win in the regular season, but not for the playoffs.  

No comments:

Post a Comment