In the first post of this series, I showed that game-by-game QB scoring on FanDuel is lognormally distributed, while QB-by-QB average scoring is normally distributed. Today, I’ll present results with respect to the same type of distributional analysis for RB scoring.
Game-By-Game RB Scoring Isn’t Normally Distributed…
Adopting the same analytical approach I took with QBs, I first plotted a histogram of all 17,576 RB game scores on FanDuel from 2007 to 2015:
As was the case with QBs, this graph also shows that a) there’s an overabundance of 0-to-2 point games (as indicated by the first two red bars), and b) game scores don’t come close to following a normal distribution (as indicated by the black curve).
So once again, I got rid of those unrepresentative outliers, and next plotted a histogram for RB game scores that placed in the Top 50 of a given week over the past nine seasons (n = 7,650):
As the black curve shows, a restricted sample of game-by-game RB scores still isn’t normally distributed. If you read my post about QBs, you know what question I answered next: Is it lognormally distributed? The answer is below:
From the looks of it, yes, game-by-game RB scoring on FanDuel appears to be lognormally distributed. Keen observers will point out that the red bars representing the amount of scores in the 7-to-12 point range fall short of what’s expected from the black, lognormal curve. This could be due to any number of reasons. One major suspect is our old friend, sampling error. A second suspect, though, is my subjective decision to go with “Weekly Top 50.” Perhaps restricting the sample to “Weekly Top 40” would produce a lognormal curve that fits better. Below is a histogram of this further-restricted data set (n = 6,120):
It turns out that restricting the sample to “Weekly Top 40” doesn’t just fail to produce better fit; it actually produces worse fit. In contrast to the “Weekly Top 50” histogram, where only an isolated 5-point range deviated from the lognormal curve, this one shows too few scores in the 10-to-17 point range, while at the same time showing too many scores in the 7-to-9 point range. Taken together, that’s a much larger deviation, and one I’m not comfortable with taking forward into the application stages of this series.
And just to hammer the point home, here’s what the histogram looks like if we take sampling to its logical extreme, i.e., “Weekly Top 20” (n = 3,060):
As you can see, with only a few range exceptions, now we have fit problems (i.e., red bars being far higher or far lower than the black curve) across game scores between 9 points and 27 points.
In short, based on the analyses above, it’s most reasonable to conclude that game-by-game RB scoring on FanDuel is best represented by a lognormal curve based on the Top 50 performers in a given week.
…But Player-By-Player RB Scoring Is Normally Distributed
As was the case with my distributional analysis of FanDuel QB scoring, the next step was to average game scores across RBs and plot a histogram of those averages. Again, for the sake of clarification, the histogram below (n = 554) is based on the exact same data as the “Weekly Top 50” histogram above, except it’s showing averages rather than individual games. So, for instance, whereas the earlier plot included separate data points for Adrian Peterson’s 30.7-point game in Week 8 of Minnesota’s 2011 season and his 11.9-point game in Week 8 of their 2015 season, the plot below includes one data point for Peterson’s 18.0-point average across the 116 Vikings games in which he finished among the Top 50 RBs of the week:
The normal curve appears to fit RB averages reasonably well, especially if one allows for campling error. That said, it doesn’t fit as well as I’d like, particular in comparison to its fit for QB averages, and so I felt compelled to plot the same exact data, but overlay the lognormal curve instead:
To the naked eye, the lognormal curve actually appears to fit better. But here’s the thing: A statistical test of distributional fit suggests that the difference between normal and lognormal is negligible. For the purposes of application (which is the whole point of statistics, in my mind), the data suggests that one can reasonably choose to go either way.
For various reasons, which I’ll get into once I’m done presenting my distributional analyses for all positions, I’m choosing normal over lognormal.
(As an aside, this happens to be another good example of one of those underacknowledged situations in research that comes along pretty often, wherein one has to make a subjective decision that heavily impacts future analyses and applications even though science is supposed to be — and is widely portrayed as — a completely objective endeavor.)
DT : IR :: TL : DR
Mimicking what I did for the QB post in this series, I examined FanDuel scoring distributions for RBs, both on a game-by-game and player-by-player basis. Once again, I found that game scores are lognormally distributed, but player averages are normally distributed.