Of all the types of measurement validity and reliability evidence in NFL analytics, the most esoteric is evidence based on internal structure. Today’s task is to bring it down from the ivory tower using real-world examples.
A Bit of Logic Is in Order
In an NFL context, the logic underlying evidence based on internal structure goes like this:
- As the “goodness” of an NFL team is an abstract concept, we can’t quantify it directly.
- We can, however, directly quantify performance stats (e.g., yards gained) that give us some indication of whether or not a team is “good.”
- In any attempt to use quantifiable performance stats as indicators of the abstract “team goodness” concept, we produce the top-line “goodness” metric by weighting said performance stats in some way.
- This weighting system represents the internal structure of our “goodness” metric.
- Validity evidence based on internal structure tells you how adequately the weighting system underlying our abstract “goodness” metric fits directly quantifiable performance reality.
- If evidence shows adequate fit, then using said “goodness” metric to judge teams is valid; if not, it isn’t.
All of the above applies to individual metrics as well. Consider, for instance, the NFL’s Passer Rating, which turns directly observable quarterback (QB) stats (i.e., completions, attempts, yards, touchdowns, and interceptions) into an abstract concept called “QB goodness” via the following formula:1
Passer Rating’s weighting system, and therefore its internal structure, is simply the set of multipliers in the formula. Therefore, something like this analysis is internal structure validity evidence against using Passer Rating to judge QBs because it showed that the set of multipliers in real NFL data a) change from season to season, and b) are almost always different from those in the formula.
Here are some other examples of internal structure in NFL analytics:
- The weights for touchdowns and interceptions in Adjusted Yards per Attempt
- The weights for first downs and touchdowns in Adjusted Yards per Carry
- The logistic regression weights in Generic Win Probability
- The snap weights in Snap-Weighted Age
The Internal Structure Witness Protection Program
Now, you might respond to the above by proclaiming, “But what about a ‘goodness’ metric like Success Rate? It doesn’t use weights, so none of this should apply, right?” Au contraire, mon frere! Success Rate (SR) and the like do, in fact, use weights; they’re just in hiding. You see, another name for unweighted is “equally weighted.” For example, if an offense produces 5 successes in 10 plays, we can report their SR in two equivalent ways:
- A 50% unweighted SR
- A 50% weighted SR with each play’s weight equaling 1/10
So actually, internal structure validity evidence is more necessary for stats like SR because they assume an internal structure (i.e., equally weighted plays) that’s in direct conflict with what we know about NFL reality.
Out of the Races and onto the Tracks
To take this a step further, I’m going to use SR as a clear example of what internal structure is and where evidence based on internal structure comes from.
Again, SR is equal to s number of successes divided by n number of plays, with each play getting a weight of 1/n. In real life, though, we know that a play’s down and distance affects its SR (e.g., success is easier on 3rd-and-short than on 3rd-and-long). Armed with these two pieces of information, we can draw a diagram of SR’s internal structure in a team context:2
In the diagram, there are seven down-and-distance situations on the left (e.g., “3L” is 3rd-and-long, “3M” is 3rd-and-mid, and so on) for each of the two team units on the right. Each line going from one object to another represents a separate aspect of the weighting system, and each formula along a line represents the weight. So for instance, a defense’s SR on 1st-and-10 (i.e., situation “1T”) impacts its total defense SR with a weight of 1/n.
Identifying SR’s internal structure is the first step; the second step is to identify the internal structure of reality:
As you can see, the only substantive difference is that we can’t know the real weights until we’ve collected some data and estimated those weights using statistics. To do that, we can use a general class of techniques called factor analysis; and in this particular case I would use confirmatory factor analysis (CFA). A ton of posts on CFA are in the offing, so I’ll save the details; the thing to understand for our SR example is simply that CFA can answer any of the following questions (among many others):
- What are the ???’s in the second diagram?
- Given our ??? estimates, is equal weighting a plausible system for calculating SR?
- Do the ???’s of the second diagram equal the 1/n‘s in the first diagram?
Each answer would constitute validity evidence based on internal structure.
DT : IR :: TL : DR
Fundamentally, every advanced metric in NFL analytics involves using an observable quantity (e.g., yards gained) to measure an abstract concept (e.g., “team goodness”), so successfully translating observations to abstractions is of utmost importance if we don’t want to talk out of our asses.
For each metric, our observation-abstraction dictionary comes in the form of a weighting system, which is also called its internal structure. Statistical techniques like CFA test our dictionary’s accuracy, and the results of such tests constitute validity evidence based on internal structure. If our translation is accurate, then we can use said metric to confidently argue that “Team A is better than Team B” or “Player A is better than Player B.”