With training camps opening this week, it’s quickly approaching fantasy football draft season. And one of the most under-appreciated aspects of using statistics to calculate player projections is forecasting the upcoming season’s offensive environment. It’s not a big thing if you don’t take it into account, but your projections will be slightly less accurate because of that omission.
Although not directly scored in fantasy leagues, Adjusted Net Yards per Attempt (ANY/A) is nevertheless the most reliable measure of a QB’s “true” passing prowess. Furthermore, ANY/A is, for the most part, indirectly scored in fantasy leagues via the fact that passing yards, touchdowns, and interceptions feature in its formula. Therefore, it seems useful to try to forecast what leaguewide ANY/A will be in 2015; and that’s exactly what you’re going to read about in this post.
The Basics of ARIMA
The trajectory of ANY/A over the course of NFL history is what’s called “time series” data, and an Autoregressive Integrated Moving Average (ARIMA) model is particularly adept at explaining and forecasting that kind of data. Without getting too technical, an ARIMA model consists of the following:
- An autogregression (AR) component, which represents the extent to which one observation — in this case, ANY/A in a given year — depends on previous observations.
- An integrated (I) difference, which is mostly an esoteric methodological device used to set up the data so that it adheres to a vital statistical assumption.
- A moving average (MA) component, which represents the extent to which one observation depends on how far previous observations have deviated from the mean.
Via Pro Football Reference, I collected leaguewide ANY/A data since the AFL-NFL merger in 1970. And after loading said data in R, I plotted it, and saw the following:
It doesn’t take a rocket scientist to see that the long-term trend that continues today started in 1979, which makes real-world sense given enactment of the Mel Blount Rule the previous year. Therefore, I restricted my time series to leaguewide ANY/A from 1979 and 2014.
Otherwise, there wasn’t much left to do besides running the R syntax and checking the residuals to make sure everything was kosher statistically.2
It turns out that the trajectory of leaguewide ANY/A over time is best forecast by a simple exponential smoothing model with a growth factor. This is what that means in English: After eliminating the tendency for the current year’s ANY/A to depend on the previous year’s ANY/A, the only factors that matter for forecasting next year’s ANY/A are
- The reliable, long-term, upward trajectory of ANY/A over time; and
- How far last year’s ANY/A forecast deviated from last year’s actual ANY/A.
For instance, here’s a breakdown of my ARIMA model’s forecast for leaguewide ANY/A in 2015:
- The leaguewide ANY/A for 2014 was 5.73.
- The long-term trajectory of ANY/A since 1979 has seen an increase of 0.04 ANY/A per year.
- Last year’s forecast underestimated the actual ANY/A in 2014 by 0.22.
- My model says that, for every one-unit underestimation last year, there’s a 0.73-unit increase in the forecast for this year.
- 5.73 + 0.04 + (0.22 * -0.73) = 5.73 + 0.04 -0.16 = 5.73 – 0.12 = 5.61.
So there you have it: My ARIMA model forecasts the leaguewide ANY/A in 2015 to be 5.61. Now, since we’re talking about a forecast, there’s going to be error involved. Therefore, I’d be remiss if I didn’t also report that the standard error of the forecast is 0.32, and so the 95 percent confidence interval for leaguewide ANY/A in 2015 ranges between 5.29 and 5.93.
Beyond 2015, my forecast through 2019 looks like this:3
Admittedly, this forecast is probably simplistic and unimpressive. That said, there’s a perfectly good mathematical reason why, which was implicit above: Aside from randomness, the leaguewide ANY/A in a given year doesn’t depend on the leaguewide ANY/A two or more years prior. So, starting in 2016 (i.e., two years into the future), an ARIMA forecast only adds the aforementioned increase of 0.04 ANY/A per-year growth and (anthropomorphically) washes its hands of the situation from then on.
My ARIMA model is based solely on the values of ANY/A from 1979 to 2014, and therefore doesn’t know dick about the fact that the outlying ANY/A last season may have been due to the NFL’s renewed focus on constraining the toolkit of defensive backs. This could very well be a fatal flaw in my projection.
DT : IR :: TL : DR
Projecting the NFL’s passing environment in 2015 is — frustratingly — vital for offensive player projections, yet difficult to calculate. ARIMA (somewhat) alleviates this frustration:
- The trajectory of leaguewide ANY/A over the past 35 years can be explained by simple exponential smoothing and a growth factor.
- My forecast of leaguewide ANY/A for 2015 is 5.61, with a 95 percent confidence interval ranging from 5.29 to 5.93. Yes, it’s a decrease of 0.12 from 2014, but that’s primarily because last year’s ANY/A was so out of whack with what it “should have been.”
- The model’s naivete about 2014’s easing of defensive back restrictions means its ANY/A forecast could be way off.
For those that want to validate/supplement my analysis or want to experiment with coding time series analysis in R, e-mail me and I’ll gladly send you my R syntax. ↩
Long-time readers and/or new readers with knowledge about ARIMA might bring a (totally valid) lack-of-model-validation critique here. My response is that 36 data points isn’t enough to reliably validate my model. I freely admit that’s a limitation of my analysis. ↩
The 95 percent confidence interval is shaded in grey. ↩