Welcome to Intentional Rounding

In 2005, Aaron Schatz of Football Outsiders authored an essay in the inaugural issue of the Journal of Quantitative Analysis in Sports (JQAS), wherein he listed the 10 biggest problems facing NFL stat analysts at the time. Seven of the 10 were topic areas of future research, but the crucial three spoke to a fundamental need for better data:

  1. Standardizing and adding detail to play-by-play
  2. Opportunities on defense
  3. Blocking

Nearly a decade later, these problems have been solved for the most part. Aside from a handful of official scorers who still haven’t gotten the memo(s), play-by-play data is standardized across NFL stadiums, and various details have been added (e.g., individual snap counts, air yards, etc.). Meanwhile, public access to all-22 coaches’ film has made the game charting projects by Football Outsiders, ESPN Stats & Info, and Pro Football Focus both easier and more information-dense. At this point, unless the NFL releases tactical data (e.g., actual play calls, individual player assignments, etc.) or deploys tracking technology (a la SportVU and PITCHf/x) — neither of which is happening soon, if ever — we are currently in possession of the best data we’re going to have for the foreseeable future.

If so, then what are the problems facing NFL analytics now? In my view, they’re as follows:

  1. Using more “state of the art” statistical techniques
  2. Assessing the reliability and validity of existing statistics
  3. Stop reinventing the wheel

Let’s take these one at a time.

Methods

It frustrates me that much of the NFL research I see on the internet continues to rely on bivariate correlations and ordinary least squares (OLS) regression, both of which were brainchildren born over a century ago. Although this is fine for people who don’t know any advanced techniques, those of us who can break the chain of MS Excel should do so more often.

And you know what? It seems I’m not the only one who believes this. Keith Goldner introduced his Markov model of football in May 2011, while people like Boris ChenTrey Causey, and Chase Stuart have applied mixture modeling, machine learning, and Bayes Theorem, respectively, on their sites in the past year or so. I applaud them, and wish to join the revolution.

Trustworthiness

The past decade has seen an arms race in the development of new statistics to evaluate teams and players. It’s been a boon to the NFL experience, but it’s also spawned a monster in the shape of a fan who doesn’t know which numbers to trust. One reason, in my opinion? After new measures are created, developers, readers, and consumers rarely provide sufficient evidence to convince skeptics why they should trust those measures. (Again, there are exceptions.)

To be clear, I’m not talking about bringing certain statistics out of their “black boxes.” Capitalism isn’t just a red herring in the online stats community, so I understand concealing the finer details of sausage-making. Rather, my frustration is that social researchers of all stripes (e.g., economists, psychologists, educators, etc.) spend a great deal of time assessing and reassessing the validity and reliability (i.e., the trustworthiness) of their measures — even ones that require a licensing fee to use. The typical outcome of these labors is that the cream of the crop gains the widest acceptance. That’s an industry standard we should strive for, and we can achieve it by doing more convincing, not less, especially when the huddled NFL masses are yearning to trust statistics.

A related point regards validating models, not just measures. For instance, one thing you want to see from, let’s say, an analysis predicting NFL wins, is the use of both an in-sample test (i.e., a calibration model) and out-of-sample test (i.e., a validation model). Again, exceptions exist, but it’s more often the case that I see someone perform a regression analysis on a full sample, and then that’s that. Maybe the model gets updated with new data later, but that doesn’t change what may have been invalid the first time around.

Redundancy

Those who know me are aware of my No. 1 bugaboo in NFL analytics: Someone comes along and creates yet another draft value chart, as if that hasn’t been done to death already. Now, don’t get me wrong, if the new wheel is an innovation — for instance, it’s the result of advanced statistical techniques — I’m all in favor of it and eager to read about it. Rather, the problem I’m identifying here is two-fold:

  1. There are so many research questions that haven’t been answered or haven’t even been attempted to be answered. Therefore, we shouldn’t waste valuable brain power and internet space writing about old news.
  2. Members of the internet NFL analytics community often seem to be disconnected from academia.

Checking to see if a topic’s been covered before is the first step in any research project — journalistic, scientific, or otherwise — so it eludes me how Google often eludes people that write on the internet. On the other hand, I am sympathetic to a lack of awareness of peer-reviewed research. Staying current with academia means getting behind the publishing paywall, which isn’t a universal capability. That said, there are many (ostensibly open) NFL research questions that have already been addressed in the academic literature, largely unbeknownst to the internet masses.

DT : IR :: TL : DR

In NFL analytics, the era of small data is over, but the era of big data is still a long way off. Therefore, we should spend this era of medium data focusing more on (a) optimizing our knowledge and (b) separating the statistical wheat from the chaff. The purpose of this site is to help move the needle in that direction, in solidarity with like-minded others. I don’t know everything (or most things), but I feel obligated to use whatever limited expertise I have in a manner that pushes the envelope. To this end, I’ll be doing the following in the coming days, weeks, months, and years:

  1. Applying advanced statistical techniques
  2. Evaluating the trustworthiness of existing stats
  3. Connecting our internet brain to our academic brain
Email to someoneShare on Facebook0Tweet about this on TwitterShare on Reddit0
Bookmark the permalink.

4 Comments

  1. Congratulations on your website! All the best luck to you!,
    Mike and Mari

  2. “At this point, unless the NFL […] deploys tracking technology (a la SportVU and PITCHf/x) — […] which [isn’t] happening soon, if ever…”

    Two days later…

    USA Today: “Work is underway to install [RFID] receivers in 17 NFL stadiums, each connected with cables to a hub and server that logs players’ locations in real time.”

    Haha! Yes!

    ***

    I’m very excited about this site! Your content is always top-notch. Best of luck!

    ***

    What’s “DT : IR”? Google fails me. Inside joke not meant to be explained, or do you care to share?

    • Heh. Yeah, saw that. Personally, I think it’s a great development, and, although the timing makes me look like a buffoon on the surface, I don’t think it affects my argument much.

      Number one, the tracking data is going to be proprietary from what I understand, so lil ol’ me and lil ol’ you aren’t going to actually analyze it. More likely, it’ll end up producing a situation like the one Michael Lopez described in relation to SportVU, whereby everyone on the outside just has to take Kirk Goldsberry’s word for whatever he his analyses say because we don’t have the means to actually replicate his work.

      Number two, even if data that awesome went public, I’d sure as hell hope we wouldn’t spend the following decade running correlations and regressions ad infinitum.

      p.s. Thanks for the kind words!

      p.p.s. The DT : IR :: TL : DR thing is just me using the heading of the summary section to make fun of my tendency to ramble on forever. In analogy form, I (DT) am to this website (IR) what too long (TL) is to didn’t read (DR).

  3. Pingback: When Does Yards Per Attempt Stabilize? - Intentional Rounding

Leave a Reply

Your email address will not be published. Required fields are marked *