I just ran into Baseball Prospectus' Adjusted Standings report, where they analyzes all team's win/loss record, and adjusts them based on a a variety of factors, such as Pythagorean (they use the Pythangpat's formula however), the Pythagorean using adjusted runs scored (though I wonder why they don't do one with adjusted runs allowed as well), and lastly one adjusted for quality of competition as well as the other adjustment.
Interesting enough, the Giants should be leading, by about 1-2 games, but over the D-backs, not the D-gers. The D-gers have been playing above their heads and should be close to, if not under, .500, which is where I opined they would be this season. Of course, it helps that Kemp is out, but they are also being propped up by pitchers doing better than previously.
Of course, the Giants should be above their Pythagorean: they are currently 4 games over .500 in 1-run games, which most teams regress to zero, but as my research showed, Bochy is the rare manager who can manage a winning record in 1-run games. So that regression that BP expects with the first order win percentage probably will not happen.
Still, even with all that, the Giants are about where they are expected to be, statistically. And the D-backs are the real team to be worried about, as they have been underperforming. I don't know how they are doing it but for the second season in a row, their pitching staff is in shambles the first couple of months, then somehow they regroup, find some new pitchers and start rising up the standings quickly.
They also have playoff odds. The Giants are currently at 83.5%, best in the NL.
thanks ogc - I always enjoy these - I thought they had blocked them out? Guess not.
ReplyDeleteThe playoff odds are little strange - Detroit has only an 11% change of winning the division? Really? - but these are good food for thought.
The thing they don't account for (amongst others) is one-run games, as you say. And I'd assume this is how BP does it, it's the million seasons played a million times results. Which means that the results will be pulled towards the team's mean - and they are arguing from the results of the past correlating, in all of MLB, to the "parts" of the past, and thus the future. I have a bit of a problem when one gets too deeply into Pythagorean analysis. I guess "someone" in quotes is arguing against my own point, but "someone" will make a run and make the playoff predictions look silly.
And so - nice chart I saw, Passan at Yahoo maybe, showed two pitchers with almost identical semi-advanced stat lines, one had an era over 6, the other at 3.95 or whatever. Lincecum and Darvish. So deviation, player by player, or team by team, plays a big role. Texas may have an 84% chance of winning their division as per BP, but I feel there's more than a 15% chance that the Angels catch them. Not likely in the end, but not 15%.
you're welcome. They recently changed access to their content - they opened up everything that is a year old or older - and thus might have returned some stats like this back to public domain, perhaps. In any case, they are available now.
DeleteDetroit's odds must be taken in context of not only their expected winning, as noted on that webpage, but also their actual record and games behind at that time as well. While they are expected to win at .524, which is close to ChiSox, they are playing under their expected, at 40-42 and already 4.5 games behind the ChiSox plus are 2.5 games behind the Indians, who is roughly around the same as they are with a .512 expected, but their actual is close at .519, or 42-39. So they not only have to play above their head just to beat the ChiSox for the division title, they also have to pass up the Indians as well, and with both teams expected to play roughly the same, that 2.5 game deficit is pretty hard to overcome in those million simulations.
Well, I understand why they don't account for the one-run games. Nobody significant among the sabermetric community has researched this and written on this, and the accepted logic there is that everyone regresses to the mean, which is generally true. Until someone does that research AND publish it at a significant site, like BP or Fangraphs or SABR, they probably won't.
And even if someone did, it is not likely that they will try to adjust the simulation to account for that, because they would have to really change up the algorithm for that, which I would expect is very expensive in terms of money and work-hours to implement. They could adjust the Giants expected runs scored or runs allowed to account for that, but I'm not sure which is the right one to adjust for that.
I agree, I wouldn't put a lot of interest behind these simulations, but still, it is interesting to note. Of course, last season, I'm sure the Giants were rated to be highly likely to make the playoffs mid-season as well, but then fell off from there, obviously, so this is not a prediction of the future, only a way to view where a team is at a particular point of time given what is know at that moment.
It can't adjust, for example, for the fact that Sandoval is returning to normal hitting for the rest of the seaon, whereas earlier he was gone for 6 weeks, then spent a few more weeks weakened by the surgery.
mmm, yes, but I don't see 4 games as being as damning as the projections seem to. And, as you note, if I recall at one point the Giants of 2011 were the most likely to make the playoffs in the NL - so my real point is that the correlation is probably fairly hideous. But then again, the boolean nature of "making the playoffs" only requires one team playing above (or below) their heads, and otherwise, it "works". And you are right - it is only as things stand now, someone gets injured or there is a big trade, it can change a lot, but there's no real way of accounting for "on July 15 the Yankees will acquire Wandy Rodriguez" at 72% and the Bosox at 10%, etc etc.
ReplyDeleteI agree that one-win situations (or others like it) are much more complex to evaluate. However, I have to laugh - I think sheer insanity of the degree of analysis in the present day makes "one run games" hardly beyond the pale of study.
My larger point, which is probably a given, is the old correlation not being causation - just because this approach correlates well only means that they've found an approach that correlates well. Making singles count 2% more because it "works better overall" doesn't mean they actually have that impact. It's taking large sample sizes and saying that one smaller sample size has an equal deviation, which is just as much a sin as the other way around.
Which drives me nuts because I'm a math geek. But I gotta have something to do, so I can get involved with the Pirates increasing their "playoff odds" by 28% in one week. It's silly, but it is fun.