Overview of Sabermetrics by RoCoBrewCrew
What you're looking at when you talk about sabermetrics is how many
wins above or below average a certain player will give your team. I'm
going to use Bill Hall as an early example here, as he's a guy Jimbo
talked about the other day. According to saber stats, Bill is worth 0.5 wins to
the team. Someone pointed out that Hall had 2 game winning hits
that directly led to 2 wins, so how can he only be worth .5 wins? I'll
explain below.
The easiest, quickest way to predict wins is to go by the rule of 10.
This means that over the course of a season, every 10 runs that you
score more than you allow, you'll win one more game. A hypothetically
average team that allows exactly as many runs as it scores should, in
theory, win 81 games. A team that scores 800 runs and allows 700 runs
should, again, in theory, be a 91 win team. What does this have to do
with Sabermetrics' Simple, using OXS (On base % times Slugging
Percentage) you can figure out quite accurately how many runs a player
creates.
Runs created are DIFFERENT than runs produced. Everyone knows the RBI +
Runs - Homeruns formula, but that's not entirely accurate for a variety
of reasons. RBI and Runs are important stats of course, but they're
counting stats that depend quite a bit on the performance of the hitters
around a given player. Any guy who hits .260 with 20 homers SHOULD
drive in 100 runs in an average major league lineup. By comparison, the
same guy might only drive in 60 if he hits 8th, or he plays for a very
poor offensive team.
Get to the point Roco! Ok, Runs created. To figure out how many runs
a player 'creates', the easiest formual is OXS. Take at bats, multiply
by SLG%, and multiply again by OBP%. Example: Batter A has 500 at
bats, and has a .350 OBP%, and a .450 SLG%. He has 'created' 79 runs.
That's a quick and dirty formula, and Jimbo will surely give a more
accurate and detailed one, but that's to keep it short here. What does
that mean though' Obviously a guy who hit to a .450 SLG% and hit say,
cleanup, had more than 79 RBI, how come he only has 'created' 79 runs?
Take this example. Batter A reaches 1st on an error by the shortstop.
Batter B rips a double down the line, and Batter A stops at 3rd. Batter
C hits a weak tapper to 2nd, and the runner on 3rd scores. Batter A
gets credit for a Run, Batter C gets credit for an RBI, but batter B,
who arguably did the most work to create that run, gets credit for
nothing. This formula credits those guys by giving a set value of runs
expected for each specific offensive event, be it an out, a double, a
walk, anything.
Now, I know a lot of people are skeptical of OPS, and what it means, but
the formula has been used to figure out how many runs every team for
every season in the course of major league history SHOULD have scored,
and then compared to how many they DID score, and the formula is an
astounding 97.5% accurate in predicting runs. That's hard to argue
with.
Lastly, I'll touch on individual performance quick. Let's use the same
guy from above, with the hypothetical .800 OPS. Let's say he's a 2nd
baseman. Dividing his 79 runs created by his 500 at bats, we find he
creates on average, .158 runs every time he hits. To find out how much
above or below average he is, you find out what the league average for
his position is, (For 2nd basemen this year, it's .132 runs per AB)
Then, you multiply his runs per at bat BY his at bats, coming up with
the 79 runs I mentioned before. Now, multiply the league average runs
per AB by his AB's and you'll come up with 66 runs produced, for a
difference of 13 runs. If that number is for a full season, that means
he's 1.3 wins better than an average 2nd baseman. Of course, that
doesn't mean he was directly responsible for 1.3 wins. It means that if
he was on a team that was otherwise COMPLETELY average at every other
position, hypothetically they'd be an 82 win team, rather than 81. Of
course, this hasn't factored in defense and base stealing, but defense
has still proven to be incredibly hard to accurately quantify
statistically. There's also other factors that Jimbo has punched in,
such as park effect, that I haven't explained, but hopefully this makes
the basics and the numbers at least make some sense!
Additional comments by JimboWis:
What is SABR? Society of American Baseball
Research (SABR) is an organization of
baseball enthusiasts who perform and discuss research in various areas of the
game. Statistics is one of these interests.
Why SABRmetrics? Ever since the time Alexander
Carthright laid down the first rules governing baseball, the game has naturally
been quantified by 'counting' statistics. Runs, safety hits, errors, etc.
And ever since those early days baseball fanatics from all over have used
numbers to qualify and debate the value of baseball players. What
SABRmetrics attempts to do is summarize the available statistics into one number
that can be assigned to one player, to be compared with the values of other
players from different positions and different teams. One can compare the
worth of an outfielder to that of a pitcher.
Why Runs Created? The purpose of playing a
season's worth of baseball games is to accumulate as many wins as
possible. So what is the one variable we can isolate that has the
strongest correlation to winning? Is it triples? stolen bases?
The answer is runs. The more runs you score, the better chance you
have of winning. Allowing fewer runs also increases your probability of
victory. Since we know scoring is most strongly associated with winning,
we need to come up with a way to calculate how many runs each player is worth.
Linear Weights or OBPxSLG? Above, RoCo mentions
the OXS formula widely used. ESPN uses it on this website. I believe
STATS, Inc. uses it. Maybe also the Total Baseball and Bill James'
books. However, I prefer to use a different algorithm. The problem I
have using OXS is we are trying to estimate a 'counting' statistic by
multiplying and dividing numbers. I prefer to use a linear algebra
approach. A 'weight' is assigned to each key offensive statistic,
depending how valuable the stat is (double, stolen base, etc.). These
weights are calculated through regression analysis. Pete Palmer in his
1982 book calculated these weights and that is what I use. Palmer's
formula calculates how much above or below the league average a player is.
Apply this formula to the AL or NL totals from any recent year and you can see
it is fairly accurate. What I do with that formula from there is to make a
slight adjustment so it does normalize the league stats, then add the league
average runs per plate appearance so I can calculate runs created for any
hitter. Note: we only need to do this for offensive players. For
pitchers we can use earned runs allowed.
Home Park Factor. One thing many people
overlook which surprises me is that a team will play half its games in the same
stadium. Baseball as a sport is unique in the fact that the playing areas
for each of the teams is different. This will introduce tremendous bias to
individual and team statistics. I feel it is very important to adjust runs
created and runs allowed for home park factor to eliminate any home park biases.
Wins/Losses. Now that we can calculate runs
above/below average, we can go one step further. How many wins does each
player contribute to the team? As RoCo mention earlier, a good rule of
thumb is an additional ten runs is worth one additional win. This is calculated
through regression analysis. Try it yourself: take last year's standings
and graph it out on a sheet of paper, wins on the vertical axis on run
differential on the horizontal axis. See what you come up with. A
player who is ten runs better than average is worth an additional win to the
team than an average player.
SABRmetrics can be a fun way to analyze and discuss players
merits, who should be MVP, ROY and so on.