Forecasting the 2015 NFL Offensive Environment: ANY/A

With training camps opening this week, it’s quickly approaching fantasy football draft season. And one of the most under-appreciated aspects of using statistics to calculate player projections is forecasting the upcoming season’s offensive environment. It’s not a big thing if you don’t take it into account, but your projections will be slightly less accurate because of that omission.

Although not directly scored in fantasy leagues, Adjusted Net Yards per Attempt (ANY/A) is nevertheless the most reliable measure of  a QB’s “true” passing prowess. Furthermore, ANY/A is, for the most part, indirectly scored in fantasy leagues via the fact that passing yards, touchdowns, and interceptions feature in its formula. Therefore, it seems useful to try to forecast what leaguewide ANY/A will be in 2015; and that’s exactly what you’re going to read about in this post.

The Basics of ARIMA

The trajectory of ANY/A over the course of NFL history is what’s called “time series” data, and an Autoregressive Integrated Moving Average (ARIMA) model is particularly adept at explaining and forecasting that kind of data. Without getting too technical, an ARIMA model consists of the following:

• An autogregression (AR) component, which represents the extent to which one observation — in this case, ANY/A in a given year — depends on previous observations.
• An integrated (I) difference, which is mostly an esoteric methodological device used to set up the data so that it adheres to a vital statistical assumption.
• A moving average (MA) component, which represents the extent to which one observation depends on how far previous observations have deviated from the mean.

To fit an ARIMA model, one can use any of several pieces of statistical software: I used R.1

Methods

Via Pro Football Reference, I collected leaguewide ANY/A data since the AFL-NFL merger in 1970.  And after loading said data in R, I plotted it, and saw the following:

It doesn’t take a rocket scientist to see that the long-term trend that continues today started in 1979, which makes real-world sense given enactment of the Mel Blount Rule the previous year. Therefore, I restricted my time series to leaguewide ANY/A from 1979 and 2014.

Otherwise, there wasn’t much left to do besides running the R syntax and checking the residuals to make sure everything was kosher statistically.2

Results

It turns out that the trajectory of leaguewide ANY/A over time is best forecast by a simple exponential smoothing model with a growth factor. This is what that means in English: After eliminating the tendency for the current year’s ANY/A to depend on the previous year’s ANY/A, the only factors that matter for forecasting next year’s ANY/A are

• The reliable, long-term, upward trajectory of ANY/A over time; and
• How far last year’s ANY/A forecast deviated from last year’s actual ANY/A.

For instance, here’s a breakdown of my ARIMA model’s forecast for leaguewide ANY/A in 2015:

• The leaguewide ANY/A for 2014 was 5.73.
• The long-term trajectory of ANY/A since 1979 has seen an increase of 0.04 ANY/A per year.
• Last year’s forecast underestimated the actual ANY/A in 2014 by 0.22.
• My model says that, for every one-unit underestimation last year, there’s a 0.73-unit increase in the forecast for this year.
• 5.73 + 0.04 + (0.22 * -0.73) = 5.73 + 0.04 -0.16 = 5.73 – 0.12 = 5.61.

So there you have it: My ARIMA model forecasts the leaguewide ANY/A in 2015 to be 5.61. Now, since we’re talking about a forecast, there’s going to be error involved. Therefore, I’d be remiss if I didn’t also report that the standard error of the forecast is 0.32, and so the 95 percent confidence interval for leaguewide ANY/A in 2015 ranges between 5.29 and 5.93.

Beyond 2015, my forecast through 2019 looks like this:3

Admittedly, this forecast is probably simplistic and unimpressive. That said, there’s a perfectly good mathematical reason why, which was implicit above: Aside from randomness, the leaguewide ANY/A in a given year doesn’t depend on the leaguewide ANY/A two or more years prior. So, starting in 2016 (i.e., two years into the future), an ARIMA forecast only adds the aforementioned increase of 0.04 ANY/A per-year growth and (anthropomorphically) washes its hands of the situation from then on.

Limitations

My ARIMA model is based solely on the values of ANY/A from 1979 to 2014, and therefore doesn’t know dick about the fact that the outlying ANY/A last season may have been due to the NFL’s renewed focus on constraining the toolkit of defensive backs. This could very well be a fatal flaw in my projection.

DT : IR :: TL : DR

Projecting the NFL’s passing environment in 2015 is — frustratingly — vital for offensive player projections, yet difficult to calculate. ARIMA (somewhat) alleviates this frustration:

• The trajectory of leaguewide ANY/A over the past 35 years can be explained by simple exponential smoothing and a growth factor.
• My forecast of leaguewide ANY/A for 2015 is 5.61, with a 95 percent confidence interval ranging from 5.29 to 5.93. Yes, it’s a decrease of 0.12 from 2014, but that’s primarily because last year’s ANY/A was so out of whack with what it “should have been.”
• The model’s naivete about 2014’s easing of defensive back restrictions means its ANY/A forecast could be way off.
00

1. For those that want to validate/supplement my analysis or want to experiment with coding time series analysis in R, e-mail me and I’ll gladly send you my R syntax.

2. Long-time readers and/or new readers with knowledge about ARIMA might bring a (totally valid) lack-of-model-validation critique here. My response is that 36 data points isn’t enough to reliably validate my model. I freely admit that’s a limitation of my analysis.

3. The 95 percent confidence interval is shaded in grey.

1. Red

League ANY/A in 2014 was 6.14 not 5.73. Kinda screws up your whole analysis…

• Danny Tuccitto

Thanks for the feedback.

I figured someone might bring this up. Here’s the explanation:

For whatever reason, Pro Football Reference’s listings for leaguewide ANY/A don’t equal what they should if you calculate them manually. For instance, as you said, they show 2014 as having a 6.14 ANY/A, but that doesn’t mesh with the listed totals for passing yards, touchdowns, interceptions, sacks, and sack yards. If we use the totals, which is the correct thing to do, then we get the following calculation:

ANY/A = [121247 – 7651 + (807*20) – (450*45)] / (17879 + 1212)
= (121247 – 7651 + 16140 – 20250) / (17879 + 1212)
= 109486 / 19091
= 5.73

And, although you only mentioned 2014, it turns out that this discrepancy permeates through throughout PFR’s “passing environment” page.

I honestly do not know for certain why their calculations are wrong on that page, but I suspect it might be due to them either a) only using qualifying quarterbacks, or b) committing an “average of the averages is not the average” error whereby they’re reporting what the ANY/A was based on the average of 512 team games as opposed to what the ANY/A was based on leaguewide totals. And to add even more weirdness, it’s probably not the latter explanation because the calculations for all of the rest of their rate stats seem to have been done the way I calculated ANY/A above.

Hope that clears it up. Thanks again.

2. Red

Thanks for the reply, Danny. I figured out what’s going on here: PFR displays total NET passing yards, so when you add together PFR’s totals, you end up double counting sack yardage. Try using 128,898 as your gross yardage total and I bet you get 6.14 ANY/A.

I wish PFR would fix this, as it sure makes things confusing!