Pass-Run Ratio: MLM or OLS?

Using the intraclass correlation coefficient (ICC), I’ve previously shown that a team’s Pythagorean Win Percentage (PythW%) and Pro Football Reference’s Simple Rating System (SRS) in a given season are dependent on the identity of their head coach (HC). These findings suggest that

  1. ordinary least squares (OLS) regression is biased when using PythW% or SRS as outcome variables; and
  2. attempts to predict a given team’s PythW% or SRS in a given season are missing a big piece of the puzzle if they omit HC factors.

Today, I’m going to continue this line of study by looking at offensive pass-run ratio (P-R%), which is simply the proportion of a team’s pass attempts and sacks to its total number of plays.1

I think P-R% is a good stat to examine in the context of MLM vs. OLS for two reasons. First, if we think about this intuitively, whereas PythW% and SRS are team-wide stats over which the HC has some control, P-R% is a stat over which the HC has considerable control. If an HC has an offensive background, he’ll dance with the one that brung ‘im. If, on the other hand, he has a defensive or special teams background, he still has control over the hiring of an offensive coordinator, and that person will almost always have a P-R% preference compatible with his own.

Second, I believe one of the most practical descriptive stats to emerge onto the NFL analytics scene over the past couple of years is Chase Stuart’s Offensive Identity metric. Conceptually inspired by Football Outsiders’ mantra, “you run when you win, not win when you run,” Offensive Identity uses a team’s season-to-date Game Scripts to produce a P-R% index that’s independent of the team’s average scoreboard situation. Offensive Identity has great potential for predicting games (for educational purposes only) or predicting player stats (for fantasy purposes only) — but only if it turns out to be a predictive, rather than descriptive, stat. And unfortunately, whether or not we’re able to use Offensive Identity to predict P-R% for a given team in a given game remains an open question.

These two ideas — that HCs have considerable control over P-R% and that Offensive Identity’s potential usefulness comes from being able to predict the P-R% for a given team in a given game — suggest that MLM might be a statistical technique that’s tailor-made for P-R%. So, let’s get to it. When trying to predict/explain variance in P-R%, should we use MLM or stick with OLS?

Methods

From Pro Football Reference, I obtained P-R% data for all team seasons from 1978 to 2014. After excluding HC regimes that lasted fewer than three seasons, my final sample comprised 979 team seasons across 175 HC regimes. I then used Mplus to estimate a Random Effects Analysis of Variance (RA) model, the output of which includes the overall HC-adjusted P-R% mean, the within-HC variance, the between-HC variance, the standard error of the between-HC variance, the ICC, and the Design Effect (Deff).

Results

Over the past 27 NFL seasons, the variance of P-R% within HC regimes was 0.18%, which translates to a standard deviation of 4.3%. Given a normal distribution, this means that 95% of seasons during the same HC tenure for the same team could be expected to have a P-R% that falls between 46.4% and 63.0%. Meanwhile, the average HC regime has exhibited a 54.7 P-R% with a standard error of 0.03%, which makes it highly statistically significant from zero, and therefore means that distinct HC regimes significantly differ with respect to P-R%. Finally, the between-HC variance was 0.14%, which translates to a standard deviation of 3.7%. Therefore, 95% of HC regimes could be expected to have a P-R% between 47.0% and 58.4%.

Getting down to the nitty-gritty, the ICC — as a reminder — is calculated as

ICC Formula

 

so, in the case of P-R%, it equals 0.0014/(0.0014 + 0.0018), which equals 0.44, or 44%.2 Therefore, 44% of the total variance in P-R% since 1978 depends on the identity of the HC;  or alternatively, 44% of P-R% can be explained by HC-related factors.

From there, the formula to calculate Deff is

Deff

where m is the average duration of an HC regime and ρ is the ICC, which in this case equal 5.59 and 0.44, respectively. Therefore,

Deff = 1 + [(5.59- 1) * 0.44] = 1 + (4.59 * .44) = 3.01.3

And since a Deff of 3.01 means that using OLS would produce standard errors that are less than half of what they actually are in the population of NFL teams, any statistical analysis with P-R% as the dependent variable should use MLM instead.4

DT : IR :: TL : DR

Based on data from 979 team seasons across 175 HC regimes since 1978, I found that

  • The average P-R% expectation for a random HC regime is between 47.0% and 58.4%.
  • HC regimes vary significantly with respect to their average P-R%.
  • 44% of the total P-R% variation can be explained by HC-related factors.
  • OLS standard errors would be highly underestimated in a regression model that uses P-R% as the outcome variable.

The main practical implications of this analysis are that (a) P-R% does indeed follow the HC, and (b) optimal prediction of P-R% for a given team in a given game requires MLM, not OLS.

Email to someoneShare on Facebook0Tweet about this on TwitterShare on Reddit0

  1. NFL play-by-play is riddled with errors related to whether or not a quarterback carry was a scramble or a designed run, so there is some measurement error inherent in this analysis. That said, because these plays are a tiny portion of the overall sample, I don’t believe said error biases my results to a significant degree. 

  2. Intentionally rounded. 

  3. Intentionally rounded. 

  4. 1 / 3.01 = .332, aka “less than half.” 

Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *