In NFL analytics, multilevel modeling (MLM) is necessary sometimes because team seasons are inherently clustered within head coaching regimes. But what about individual seasons? Aren’t they clustered within players? Why yes they are! And a technique called latent growth modeling (LGM) addresses that very situation.
It’s Just a Matter of Time
Recall that MLM overcomes two erroneous assumptions made in ordinary least squares (OLS) regression: independent observations and fixed effects. These benefits also accrue to LGM because (a) individual seasons by the same player are not independent, and (b) predictors of individual performance do not affect different players to the same extent.
The distinction between the two techniques boils down to one word: time. Whereas MLM in an NFL context explains/predicts variation in team performance as a function of its head coach, LGM explains/predicts variation in individual performance as a function of time; hence latent growth modeling.
You Could Never Know That in a Time Trap
Because of the time component, LGM resolves a few additional problems for traditional longitudinal techniques like repeated measures ANOVA:
- Violating the assumption that all individuals start at the same baseline
- Violating the assumption that all individuals develop at the same rate
- Violating the assumption of sphericity
- An individual’s standing relative to others is the same across time points.
- Variation between individuals is the same at each time point.
Equal Baselines and Equal Growth Rates
The main application of LGM in an NFL context is via age curves, which heretofore have been analyzed using the general method developed by Bill James for baseball: Find the average performance for all players at a given age, and then plot a curve that best fits those averages. Doing the same for football usually looks something like this:
For demonstration purposes only, I’ve used data from eight quarterbacks (QB) to calculate the average Adjusted Net Yards per Attempt (ANY/A) at a given age. Each red dot represents the average ANY/A at that age, and the black line is the resulting age curve.
As was the case with MLM vis-à-vis OLS, the hidden problem is that we’ve assumed everyone is the same; it’s just that, this time, “everyone” means all individuals rather than all clusters. If we were to use the black trendline to draw conclusions about QB aging, we could only say that the average QB has a baseline of 5.5 ANY/A at Age 26, the average QB slowly progresses until his peak around Age 28, and the average QB declines quickly from Age 29 to Age 31.
But what about above-average QBs like Philip Rivers? What about late bloomers like Eli Manning? The traditional method illustrated above tells us nothing about them because it assumes all QBs have the same baseline and the same growth rate.
To drive this point home, below is the same graph, except now I’ve plotted individual age curves for four of the eight QBs:1
Although all four QBs exhibit some kind of curve, only Mr. Brown comes comes anywhere close to following the “average” curve. Mr. Blue starts below the average baseline at Age 26 and doesn’t peak until Age 30. Meanwhile, the other three QBs start right around average and peak at Age 27, but experience different rates of decline from Age 28 to Age 31.
It’s worth stressing that the problems I’ve described so far aren’t just sources of mental masturbation for a guy with an NFL analytics blog; there are real-world consequences to ignoring them. Using traditional repeated measures techniques on NFL data (a) makes us susceptible to drawing incorrect conclusions about players, and (b) prevents us from examining a universe of questions about how specific players develop over time.
Plotting the individual curves is also useful for illustrating how traditional ways to analyze change over time violate the sphericity assumption, which has two components: (1) The standing of individuals relative to others should be the same across time points; and (2) the variation between individual QBs should be equal across time points.
So how would sphericity show up in the graph? First, if there was the same relative standing across time, Mr. Blue would start with the lowest ANY/A at Age 26 and remain the lowest through Age 31. Instead, we see that he starts with the lowest but ends with the highest. Meanwhile, Mr. Brown starts with the highest but finishes in third place; Mr. Blond starts in third, moves up to second, and finishes in fourth. If large enough, changes like these would violate the sphericity assumption.
Second, if there was equal variation across time, the spread of the individual QB curves would remain consistent: Messrs. Brown, White, Blond, and Blue would be just as close in ANY/A at Age 26 as they are at Age 27, Age 28, and so on. Again, the graph shows this isn’t the case. For instance, although the lines are just as spread out at Age 27 as they are at Age 28, they’re much more spread out at Age 31 than they are at Age 26.
DT : IR :: TL : DR
Traditional techniques for analyzing repeated measures NFL data tend to violate the following assumptions:
- Player seasons are independent of each other.
- Predictor variables affect every player to the same extent.
- All players start at the same performance baseline.
- All players develop over time at the same rate.
- Players remain in the same position relative to other players.
- Variation between players stays the same over time.
LGM requires none of these assumptions. Therefore, as was the case with MLM vis-à-vis OLS, it leads to better inferences and reveals information otherwise hidden by techniques like the Bill James method or repeated measures ANOVA.
Mr. White is Philip Rivers, Mr. Blond is Boomer Esiason, Mr. Blue is Eli Manning, and Mr. Brown is Mark Brunell. ↩