# Pythagorean Win Percentage: MLM or OLS?

To demonstrate that multilevel modeling (MLM) is less biased than ordinary least squares regression (OLS), both in terms of statistical assumptions and the handling of variance, I’ve relied so far on a tiny sample of 4 team seasons for each of 8 head coaches. Today, I’m going to use a more typical sample in NFL analytics to answer an important question about Pythagorean Win Percentage (PythW%): Should we use MLM or stick with OLS?

### The Data1

I got PythW% and head coach (HC) data from Pro Football Reference (PFR) for every NFL team since 1970.2 I then identified 305 unique HC “eras,” defined as continuous years of the same head coach running the same team.3 For instance, Tony Dungy’s six-year Buccaneers era is distinct from (among others) Jon Gruden’s seven-year Bucs era, Jim Caldwell’s three-year Colts era, and Dungy’s own seven-year Colts era.

### The Random Effects Analysis of Variance Model

As I wrote last week, the decision to use MLM based on HC eras is made a priori and comes from the theory that an NFL team, led by its HC, performs in a relatively homogenous, relatively stable social environment.4 To double-check that this is the case, we have to run a random effects analysis of variance (RA) model. In future posts, I’ll go into more technical detail about all of the various types of MLM models,5 but the important things for now are that the RA model is the most basic type, and it produces three estimates that tell us about the multilevel nature of PythW%:

1. The within-HC variance: Compared to the “average” season in the same HC era, how much does PythW% vary from one season to another?
2. The between-HC variance: Compared to an “average” HC era, how much does PythW% vary from one HC to another?
3. The standard error of between-HC variance: Is the between-HC variance significantly greater than zero?

### RA Model Estimates

It turns out that the estimated within-HC variance was 1.9% and the estimated between-HC variance was 1.1%. The latter’s standard error (.001) produces a t-statistic of 7.7, which means HC eras vary significantly with respect to their expected PythW% (i.e., their intercepts).

### The Intraclass Correlation Coefficient

Recall from my post about variance that the intraclass correlation coefficient (ICC) is the ratio of between-group variance to total variance, both of which we just obtained from the RA model. Therefore, we can now make the following calculation:

ICC = 0.011/(0.011 + 0.019) = 0.011/0.030 = 0.366

which means we can make three equally valid interpretations:

1. 36% of the total PythW% variation is between HC eras.
2. 36% of the total PythW% variation can be explained by HC-related factors.
3. The correlation of PythW% for two seasons within the same HC era is 0.36.

The general rule of thumb is that an ICC around .20 is large enough to suggest using MLM, so we’ve far exceeded that threshold with PythW%.

### The Design Effect

If we’re still not convinced, though, we can calculate the design effect (Deff), which tells us how “undersized” our OLS standard errors would be if we didn’t group team seasons by HC era. All we need to calculate it are the ICC and the average cluster size. In this case, the average HC since 1970 has lasted 4.2 seasons with the same team, so our calculation is as follows:

Deff = 1 + [(4.2 – 1) * 0.36] = 1 + (3.2 * .36) = 2.167

The general rule of thumb is that Deff around 2.0 is large enough to suggest using MLM, so we’ve crossed the barrier once again with PythW%.

### DT : IR :: TL : DR

With data from 1,285 team seasons across 305 HC eras since 1970, I obtained all the statistical information I needed to decide whether or not we should start using MLM in the context of PythW%. Here’s what I found:

1. There’s significant variation between HC eras in terms of their average PythW%s.
2. 36% of the total PythW% variation can be explained by HC-related factors.
3. OLS standard errors would be less than half what they should be in a model using PythW% as the outcome variable.

Therefore, when we’re interested in predicting PythW%, should we use MLM or stick with OLS? The pretty clear answer here, which I’ve been foreshadowing for a week, is that we should use MLM.

00

1. E-mail me if you want it.

2. For teams with multiple head coaches in the same season, I used the one that began the season in control.

3. Note: I considered special cases like the 2012 Saints as part of the otherwise-continuous eight-year Sean Payton era.

4. I mean “stable” as in “consistent,” not as in “sane.”

5. You can run all of them in a panoply of statistical software, from proprietary programs like Mplus to freeware like R

6. Intentionally rounded.

7. Intentionally rounded.