For Footballguys ($), I spent the latter half of the 2015 season calculating and writing about the probability that players, given their salaries and projected points that week, would achieve cash game or tournament value on the daily fantasy sports (DFS) site, FantasyScore. And while I think the endeavor was theoretically sound and a practical success, a few things I’ve noticed while exploring both calibration and validation data over the past couple of months has led me to believe some refinement is necessary:

- Game-by-game fantasy scoring for all positions doesn’t seem to be normally distributed (i.e., bell-shaped like IQ, for example), whether we sample across all players at a given position or a more DFS-viable subset.
- In what should be to regular readers a total non-surprise given how much I’ve written about multilevel modeling on this site, DFS scoring seems to operate at two levels: the
*within*-player level (i.e., WR A scored more points this week than he did last week) and the*between*-player level (i.e., WR A has averaged more points than WR B over the past X years). - Traditional heuristics regarding value (e.g., 2x in FanDuel cash games, 4x in DraftKings tournaments, etc.) seem outdated for a variety of reasons. And although my colleagues at FootballGuys have created a value metric that’s a vast improvement, a couple of additional tweaks are warranted.

In this series of posts, I’ll be presenting the results of various statistical analyses that examine, and hopefully resolve, the three issues briefly described above.

## Game-By-Game QB Scoring Isn’t Normally Distributed…

Below is a chart showing the distribution of FanDuel scoring for 5,880 QB games played from 2007 to 2015:^{1}

For those that aren’t familiar with this type of chart, which is called a histogram, here’s what it’s designed to show you. Each red bar represents how much of the sample — in this case 5,880 QB game scores — lies within its corresponding range of FanDuel points, each of which is about two-points wide in this particular instance. The taller the bar, the more QB game scores fell in that range. Meanwhile, the black curve represents the normal distribution, which is what we *expect* the bar heights to look like.

The obvious thing that jumps out at you is the overabundance of QB games between 0 and 2 points. These scores resulted from backups entering late or starters exiting early, neither of which are representatitve of a “typical” QB performance, especially for DFS purposes. And because of these outliers, the normal distribution curve doesn’t fit well.

Therfore, let’s exclude those QB games and produce a histogram that only includes the Top 20 QB scores in a given week from 2007 to 2015 (n = 3,060 QB games):

The normal curve fits better for this restricted sample than it did for all QB games, but the fit still isn’t that great. Scores in the 15-to-18 point range are over-represented, while those in the 0-to-10 point and 20-to-28 point ranges are under-represented.

Alright then, maybe game-by-game QB scoring on FanDuel isn’t normally distributed after all. In that case, the next suspect to question is the *log*normal distribution, which looks similar, but is skewed towards the left. Perhaps, if we take the same QB scores, but make the black line represent a lognormal curve instead of a normal curve, the statistics of performance might better match reality. To wit:

The overwhelming statistical fact here is that the lognormal curve fits game-by-game QB scoring about as well as one could hope for; and far better than did the normal curve. Sure, the 14-to-16 point range may be slightly over-represented and the 22-to-24 point range may be slightly under-represented, but that’s a bit of nitpicking that can be explained away by sampling error.

## …But Player-By-Player QB Scoring *Is* Normally Distributed

Hopefully, I’ve convinced you via the above chart that FanDuel scoring for Top 20 QB games in a given week isn’t normally distributed; rather, it’s *log*normally distributed. The story doesn’t end there, however. Recall that I’ve previously shown NFL data is usually clustered (or nested). In the context of players, this means that a) the performance of a given player in a given game tends to cluster around his own average performance, and b) player averages tend to cluster around league-average performance. Therefore, in addition to examining the distribution of QB games, it’s wise to also examine the distribution of QB averages.

To that end, the chart below shows what happens when you plot the same exact data used to create the “Weekly Top 20” charts above, but aggregates it across QBs while they’re playing for the same NFL team (n = 207 QBs). For instance, whereas the previous charts included separate data points for Cam Newton’s 16.3 FanDuel points in Week 16 of Carolina’s 2015 season and his 35.3 FanDuel points in Week 14 of their 2014 season, the following chart includes only one data point for Newton’s 23.5-point average across the 68 games in which he finished among the Top 20 of the week:

From the black line, which now once again represents the normal curve, it looks like QB averages — unlike QB games — do indeed resemble a normal distribution, with a mean of 17.8 FanDuel points and a standard deviation of 3.2 points.

But what’s driving these divergent findings? And how do we apply them in practice? I’ll leave that discussion for after I’ve presented distributional analyses for all positions.

## DT : IR :: TL : DR

Because I’ve found it to be a useful guide for constructing DFS lineups, I’ve undertaken a project to further refine the value probability system I wrote up on Footballguys this past season. The first step in that project is distributional analysis, and the first position I examined was QB. Based on FanDuel data from 2007 to 2015, scoring across 3,060 individual Top 20 QB games follows a lognormal distribution, while scoring averages for the 207 QBs that played in those games follows a normal distribution. Explanations, implications, and applications are yet to come.

Yes, I’m magically switching from FantasyScore to FanDuel here, but that’s only because the vast majority of my real-life DFS participation is on FanDuel. Shoot me for being self-serving. ↩