A couple of weeks ago, Over The Cap‘s Bryce Johnston and Nicholas Barton (J&B) introduced a new metric called Expected Contract Value (ECV), which they described as such:
Expected Contract Value determines the probability that a player will remain under contract, for each season of the contract, despite the non-guaranteed nature of the contract.
ECV has received praise from the NFL analytics corner of the internet. Brian Burke of Advanced Football Analytics called it “brilliant,” while Aaron Schatz of Football Outsiders called it a “great series.”
But from a methodological perspective, is this praise deserved? That’s the question I’ll be exploring today in this foray into reviewing online NFL research.
The Basics of ECV
Paraphrasing J&B, media discussions about contracts have traditionally centered around guaranteed money, three-year payout, and average money per year (APY); objective metrics to be sure, but flawed metrics just the same. Guaranteed money and three-year payout underestimate the value of a contract because they assume a 0% chance of players earning money beyond what’s guaranteed or what’s in Years 1-3, respectively. In contrast, APY tends to overestimate contract value because it assumes that each dollar has an equal chance of being earned.
To remedy this, J&B created a logistic regression model that spits out the probability that a player will be under contract in a given year, based on the following six factors:
- Savings per Cap Dollar: “The amount of cap savings that the team would realize upon releasing a player [divided by] the player’s cap number if the team does not release him.”
- Cap Dollars per APY: “The player’s cap number if the team does not release him [divided by] the [APY] of the player’s contract.”
- Savings per APY: “The amount of cap savings that the team would realize upon releasing the player [divided by] the APY of the player’s contract.”
- Contract Completion Percentage: “The number of years of the contract that have been completed [divided by] the total number of years of the contract.”
- Peak-Relative Age: “The number of seasons the player has played [divided by] the number of seasons that a player in the theoretical peak football player age has played.”
- Dead Money Percentage: “The amount of potential dead money upon a release that could be deferred to the following year if the release is done as a post-June-1st release [divided by] the total potential dead money upon a release.”
Strengths of ECV
The main feature of ECV is that it’s a pioneering attempt to improve discourse about NFL player contracts. That’s no small feat, and one J&B specifically mentioned as the main purpose of their research.
Another strength of ECV is that it’s probabilistic. Nothing in life is certain, so we should phrase our predictions about real life in terms of likelihoods. Brian Burke and Pro Football Reference do this with their in-game win probability models. Aaron Schatz does this with projected win totals in Football Outsiders Almanac. It’s good to see J&B doing the same with ECV.
The third strength of ECV is that it provides an objective way of assessing player contracts without stepping foot in the snake pit that is player performance analytics. Should we incorporate AV across positions? Should we incorporate Pro Football Focus grades across positions? Should we incorporate DYAR for offensive positions? To their credit, J&B answered these questions with, “none of the above,” and instead focused on creating a metric that avoided performance evaluation altogether.
Weaknesses of ECV
These aren’t weaknesses, per se. Rather, they’re methodological issues that J&B — presumably because of their non-technical audience — didn’t address in their series, but require explanation nonetheless if we’re to trust ECV as a metric to use in our discussions about player contracts.
Multicollinearity and Overfitting
The first three predictors in the ECV model are simple arithmetic transformations of each other, and that’s a recipe for multicollinearity: Savings per APY depends on Cap Dollars per APY and Cap Dollars per APY depends on Savings per Cap Dollar.
And where there’s multicollinearity, there’s usually overfitting. It came as no surprise, then, when I read
We can account for approximately 80% of real-life decision-making.
It’s possible that ECV owes its high level of in-sample accuracy to the fact that three of its predictors are redundant.
Absence of Cross-Validation
Speaking of which, the way J&B described their sample suggests that they did not cross-validate their model with an out-of-sample test. This affects the trustworthiness of ECV in at least two ways. First, it deprives us of potential evidence about multicollinearity and overfitting. If these problems existed, they would have revealed themselves via decreased accuracy in cross-validation.
Second, it deprives us of potential evidence about whether or not ECV’s predicted probabilities are calibrated well, of which, perhaps ironically, Brian Burke has been the champion for years. When J&B predict future players to have an 80% chance of remaining on contract in a given year, will those players actually remain under contract (around) 80% of the time? Without cross-validation evidence, we can’t answer that incredibly important question.
As I’ve written about previously, NFL analytics has a general problem of ignoring the regression assumption that one row of data isn’t dependent on another. J&B appear to have fallen into that trap:
Each contract season is a single input record. So if a given player’s contract covered five seasons from 2005-2009, this resulted in five different contract seasons for the purpose of creating input records.
The simple fix here would be to calculate their sample’s intraclass correlation coefficient (ICC), and use multilevel modeling (MLM) if the resulting design effect is larger than 2.0.
DT : IR :: TL : DR
The positives of ECV are
- It’s pioneering and improves discourse.
- It’s probabilistic.
- It’s not dependent on the vagaries of player performance measurement.
Questions that I’d like J&B to answer about ECV are
- What is their evidence that multicollinearity and overfitting aren’t present in their results?
- What is their evidence that predicted probabilities are calibrated well?
- What is the ICC and design effect associated with their study?