Hi all, I’m working on adapting the rugby hierarchical model example for (American) fantasy football.
My intention is the following (and I think I successfully did this in my code below? Running locally I get almost no divergences):
- Goal: for a specified position (ex: only quarterbacks), get latent variables describing a player’s ability to score points
-
Background: in fantasy football, you draft a roster of players that score points each week depending on how they play in the actual football game. Specifically, prior to the season, players are attributed overall projected points (
float
dtype) that encompass the entire season. My intention is to use these season projected points, coupled with inference based on the season schedule, to ultimately get per-game projected points. -
Method: Like the rugby example, I am using a log-linear random effect model. For a given player in a given game, log\theta_{player, game} = player_{player} + off_{player team} + def_{opponent team}. Then, for the entire season, \theta_{player}=\sum_{game} \theta_{player, game}
- player_{player} represents an individual player’s contribution; it is supposed to capture effects not related to the broader teams that are playing
- off_{player team} represents the rest of the player’s team’s (they’re on offense) overall contributions; ex: a quarterback earns points only if the player they pass to catches it, a running back only earns points if the team blocks the opponent, etc.
- def_{opponent team} represents the opposing team’s (they’re on defense) overall contributions; ex: a great running defense makes it very difficult for a running back to get points
- Notes: points cannot be negative, hence my choice for a truncated normal likelihood. However, this may not be appropriate given that some players have projections of zero points for the season - they’re usually bench players that (at least in a fully healthy season for the rest of the team) won’t see game time. It may ultimately be worth filtering the data above a minimum threshold of projected points if it enables the model to sample much better.
My overall model has the following structure. Please see the code linked below for the nuances of the variables, but they again follow a similar idea as in the rugby example.
The above graph was generated for the quarterback position. There are 105 quarterbacks in the dataset. There are 32 teams in the NFL, and each team plays 17 games. 17 * 105 = 1785, which is the size of the dataset and where the dimensionality for theta_weekly
comes from. Aside - one game is played by each team per week, so “weekly” makes sense as a variable name.
Questions
- What tricks can I do to help the model sample faster? I imagine that my priors aren’t very informative, and my likelihood function may not even be appropriate. And that’s assuming my summation from
theta_weekly
totheta
actually did what I was after. - I tried to get creative with the summation from
theta_weekly
totheta
. Is there a way to achieve this purely with dimensions? I think the tricky part is that not every player plays every week, so I wasn’t sure how to ultimately get things to work. - Does it make more sense to move the
intercept
fromtheta_weekly
totheta
? I kept this in the model based on the rugby example, but since I have an extra hierarchy I’m starting to think that I should move it totheta
.
Fully-interactive Google Colab to help me troubleshoot, data included: link