These aren’t the stats you’re looking for

A post about Star Wars and baseball that does not mention R.A. Dickey.  Except for right there.  Oops.

We’re knee-deep in the ides of March, which, for all of you who are not Roman emperors, means lengthy discussions of the rites of spring training.  It’s the time of year when every injury, every car-flipping DUI, every sub-par outing, every billion dollar lawsuit, and every sky-high batting average is completely blown out of proportion by sportswriters with too much time on their hands and too little actual news.  What a way to make a living.

As for me, I’m stuck in fast-moving bumper-to-bumper traffic thinking about a Twitter discussion about the significance of spring training stats, of which I probably only saw less than half before throwing out a comment and losing interest.  My remark about considering all stats to be meaningless probably went unnoticed but would otherwise have been dismissed as a satirical take on the whole sorry state of affairs.  Which, in some part, it was, but it was also touching on a deeper truth.  I just don’t believe in baseball stats.

This probably comes as a surprise to anyone who knows me.  One of my greatest joys in life is creating overly complicated spreadsheets that twist and turn data into shapes that data has no business being in, proper methodology and best practices be damned.  It’s something that I do sparingly but with precise craftsmanship at work and for fun, always tweaking and polishing the few cases I have deemed worthy of my attention.  And yet, baseball rarely enters into that realm.

The reason for my lack of interest in baseball stats certainly has nothing to do with my level of interest in the subject matter.  One of the most detailed data analysis projects I have embarked on is about Star Wars figures.  Well, not even that, more like opinions about Star Wars figures.  And before you start thinking that I am some Star Wars geek supreme, I am actually rather indifferent towards the franchise (I lean more toward Transformers, Gundam, and G.I. Joe than anything else).  Sure, I have some (way too many) Star Wars figures, but that’s just because I’m a sucker for quality action figures.  No, when it comes to Star Wars, I’m really in it for the sweet, sweet data.

One of these is about baseball and the other is about Star Wars. If any of it makes sense to you, seek help.

The data in question is from the annual wishlist poll held at Rebelscum.com since 2008.  You see, even with mountains of action figures released over the past 35 years, Star Wars fans are still not satisfied.  There are over a thousand new or updated figures that fans are after, which this poll whittles down to a few hundred that are put up to a vote to assess their popularity among a small sample of the fan community.  It is a statistical paradise that is unmatched in the “childhood interests that adults obsess over” industry.  Well, except maybe for baseball stats.

Baseball statistics discussions were never a part of my childhood.  Sure, I saw them in box scores and on baseball cards, but I never owned a book of stats or knew anyone who ever talked of such matters.  I knew the basics of how to calculate batting averages and ERAs and what was considered a game-winning RBI, but I just didn’t care.  Ironically, these things just didn’t seem real to me then and still seem less real than opinions on merchandise from a work of fiction.

My baseline for what was “real” in baseball isn’t particularly definitive – three years of Little League.  Still, a game’s a game and that’s my closest first-hand experience with the game.  In those three years, not once did we ever think about batting averages, ERAs, OBPs, or even the hits and runs that were inevitably tallied as part of every game (well, except maybe for the game we lost 35-5, but that’s a story for another day); the only numbers that ever held any importance were Ws and Ls.  Now, I’m sure those other stats would have been disappointing if they had been calculated (“Your batting average this year was .083!”), but we won and lost as a team, no matter what any one person contributed.  Oh how simple-minded we were.

That’s not to say that individual contributions don’t combine to form the Ws and Ls.  Given enough data, it is really a simple matter to break the game down into every tiny contribution and develop a set of weightings that will give you a reasonable approximation of the actual events.  Well, simple in concept at least.  It should work, and indeed people have been working at it since the early days of the sport, resulting in today’s various WARs and such.  But something still isn’t right.

The problem with baseball stats really comes down to the foundation they’re built on.  In a defense of “modern” stats vs. “traditional” stats I saw a while back, the simplicity and obviousness of old favorites batting average and earned run average were soundly refuted.  These both look fairly solid on the surface because they are built on simple equations: BA = H/AB and ERA = 9*ER/IP.  Simple, right?  But what’s an at bat?  What’s a hit?  What’s an earned run and who gets credited with it?  Dig a little deeper and you hit a complex set of rules and regulations that culminate in “whatever the official scorer says it is.”  Essentially, “A wizard did it.”

Official scorers are the unseen hand of baseball.  Rarely do you ever see books about official scorers and you almost never see them mentioned except when someone complains that the official scorer cost someone a no-hitter by calling an obvious error a hit.  When you deal in numbers, you need some confidence that those numbers are a solid representation of reality, within certain bounds that do not significantly alter the results of the analysis.  Without confidence in the data, it is impossible to reach a conclusion.  With so much based on “because this guy said so,” traditional stats leave much to be desired.  And really, isn’t it the result that matters most and not the classification?  The case for creating new statistical representations of baseball performances is pretty solid here.

And so we have numerous competing methods of tying actions to outcomes.  But can every instance of a particular event be counted the same?  One of my most hated phrases (which I often use in a tone of mocking contempt that rarely translates into text) is “all things being equal.”  This is a lazy cop-out that basically says “well, it should work if reality would stop being difficult and changing stuff all the time.”  Things are never equal.  You can distill out the most unequal parts, weight the others, and get something that gives the appearance of equality, but you’ll never quite get there.  Equality is an ideal, not an observable set of conditions.  I applaud those who set about making adjustments for parks and team defense and game situation and weather and ball composition and mound height and anything else that has variation across time and space, but I can’t help but wonder if it’s all futility.  I know I can’t really compare data from Original Trilogy figures to data from Expanded Universe figures (due to different finalist selection criteria and voting populations), but where’s the equivalent line for baseball stats?

Maybe I’m overthinking this.  Maybe I should just grab an “Advanced Baseball Stats for Basement-Dwelling Losers” book (do they still make books?) and engage random people on the internet in debates over a player’s value based on extra base hits under a visible full moon.  I just can’t do it though.  I can see the value at the macro level, giving you a starting point for evaluating a player’s career for those ever-popular Hall of Fame debates.  And I can see the micro level of the immediate contributions to a win or a loss.  It’s that big gray area in the middle that bothers me.  I can’t help but wonder if the efforts to develop a Grand Unified Theory of Baseball aren’t doomed to act out Zen and the Art of Motorcycle Maintenance.  I sure don’t want to make a trip to the bunk-bedders, if you know what I mean.

But back to spring training (and a big congratulations to anyone who made it this far).  You know what, who cares if the games don’t count and nobody is playing like it matters?  It’s still baseball, something we haven’t had a whole lot of for a while.  Let’s celebrate the second-rate org filler scrub who’s batting .360.  Let’s not make too big a deal about the stud pitcher with an ERA higher than his age.  Let’s cheer for the great plays, move past the boneheaded ones, and just enjoy the moment.  I’d say that life’s too short to worry about the significance of a bunch of numbers from a game, but that would probably only encourage someone to crunch the numbers on that.  There are more important things to get worked up about, like when Hasbro is going to make a proper Arena Padme.

Comments are closed.