Simple Rating System: MLM or OLS?

I’ve previously found that, when we’re trying to predict Pythagorean Win Percentage (PythW%), multilevel modeling (MLM) is a better statistical technique than ordinary least squares (OLS) regression. That’s because 36% of the total PythW% variance is due to characteristics of the head coaching (HC) regime. Or to put it another way, there’s a .36 correlation between two random seasons during the same coaching regime.

Of course, PythW% is just one of a myriad of team outcomes in NFL analytics, so “MLM or OLS?” is a question I’ll frequently be asking in this internet space. The subject of today’s interrogation is Pro-Football-Reference’s (PFR’s) Simple Rating System (SRS).

The Nuts and Bolts of SRS

A given team’s SRS represents its schedule-adjusted average victory margin. For instance, Seattle’s league-leading +13.0 SRS in 2013 means that they were expected to defeat an average NFL team by 13 points.

The computational mechanics underlying SRS involve recursion and iteration, which sounds more complicated than it actually is.1 Take the 2013 Seahawks for example. Seattle had a unique SRS formula based on (a) their own game-by-game victory margins; and (b) the SRSs of their 13 opponents, which start out as unknowns. The recursive bit is that the 13 unknowns in Seattle’s SRS formula refer to the SRS formulas of Seattle’s opponents, each of which also has 13 unknowns of its own. In other words, we can’t calculate Seattle’s SRS without first calcuating Carolina’s SRS, but we can’t calculate Carolina’s SRS until we’ve calculated San Francisco’s SRS, but we can’t calculate San Francisco’s SRS until we’ve calculated Seattle’s SRS, but calculating Seattle’s SRS was how we get into this mess in the first place.

Iteration, on the other hand, is the way we get out of this never-ending recursive loop, and it simply amounts to having a computer complete the following steps:

  1. Simultaneously plug hypothetical values into the 13 unknowns for all 32 formulas.
  2. Add up the 32 resulting SRSs from Step 1.
  3. Check to see if the sum in Step 2 equals zero.
  4. If “yes” in Step 3, halt. If “no,” repeat Steps 1-3.

This iterative process usually gets to yes within a few seconds, so we’re not talking about some intractable cryptography problem here.


Everything’s the same as before, just using SRS instead of PythW%:

  1. Get data for the 1,285 team seasons across 305 HC regimes since 1970.
  2. Estimate a Random Effects Analysis of Variance (RA) model to obtain
    1. the overall HC-adjusted mean,
    2. the within-HC variance,
    3. the between-HC variance, and
    4. the standard error of the between-HC variance.
  3. Calculate the Intraclass Correlation Coefficient (ICC).
  4. Calculate the Design Effect (Deff).


The RA model estimated an HC-adjusted mean of -1.3 with a between-HC variance of 14.6, which means that a random HC regime can be expected to produce an average SRS between -8.8 and +6.2.2 The standard error of between-HC variance was 1.9, which resulted in a t-statistic of 7.7 and corresponding p-value less than 0.001. Therefore, we can conclude that there’s statistically significant SRS variation between HC regimes.

The estimated within-HC variance was 24.6 , so to calculate the ICC, we use the formula,

ICC Formula

whereby s2b is the between-HC variance and s2w is the within-HC variance. In the case of SRS, that means

ICC = 14.6 / (14.6 + 24.6) = 14.6 / 39.2 = 0.373

and therefore, we can conclude any of the following:

  1. 37% of the total SRS variation is between HC regimes.
  2. 37% of the total SRS variation can be explained by HC-related factors.
  3. Random seasons within the same HC regime have a 0.37 correlation, on average.

Now that we have the ICC, we can calculate the Deff using the formula


where m is the average duration of an HC regime and ρ is the ICC. So, 

Deff = 1 + [(4.2 – 1) * 0.37] = 1 + (3.2 * .37) = 2.2.4

Because the ICC tells us that SRS data is a clustered — not random — sample, a Deff of 2.2 means that using OLS to predict/explain SRS would produce standard errors that are less than half of what they actually are in the population of NFL teams.5

DT : IR :: TL : DR

Based on data from 1,285 team seasons across 305 HC regimes since 1970, I found the following:

  1. The average SRS expectation for a random HC regime is between -8.8 and +6.2.
  2. HC regimes vary significantly with respect to their average SRS.
  3. 37% of the total SRS variation can be explained by HC-related factors.
  4. OLS standard errors would be highly underestimated in a regression model that uses SRS as the outcome variable.

Therefore, when we’re interested in predicting future SRS or explaining past SRS variance, we should use MLM.

Email to someoneShare on Facebook0Tweet about this on TwitterShare on Reddit0

  1. In fact, it’s so easy, you can even learn how to do it yourself in just 5:06 thanks to PFR’s tutorial video

  2. This is the 95% confidence interval. 

  3. Intentionally rounded. 

  4. Intentionally rounded. 

  5. 1 / 2.2 = .45, aka “less than half.” 

Bookmark the permalink.