I’ve a couple of non-PyMC3 questions about the Rugby Notebook Example I’m hoping someone can answer for me.
The notebook mentions log-scaling the scores, which makes sense to me. However, the observed_home_goals and observed_away_goals variables are set before any log transformation takes place. In fact, the only time a log calculation is used is
att_starting_points = np.log(g.away_score.mean())
but then that variable is never used again. Similarly, I assume the exponentiation function is used in
home_theta = tt.exp(intercept + home + atts[home_team] + defs[away_team])
because of the mentioned log transformation?
Can someone explain what I’m missing?
Thanks!
`
pinging @springcoil who wrote the example.
I appreciate you trying to bring the original author into the loop. I’m looking forward to his replies. As I’ve been looking at the notebook more, I have a another question I’m hoping I can tack onto the conversation here:
Let’s say I wanted to modify the model to simulate game results as a function of each player’s individual offensive contributions (and let’s also pretend for the moment that this data, which each player’s individual points scored is available).
would something like the following be feasible?
with pm.Model() as model:
# global model parameters
home = pm.Flat('home')
sd_att = pm.HalfStudentT('sd_att', nu=3, sd=2.5)
sd_def = pm.HalfStudentT('sd_def', nu=3, sd=2.5)
intercept = pm.Flat('intercept')
# team/player-specific model parameters
atts_star = pm.Normal("atts_star", mu=0, sd=sd_att, shape=N_PLAYERS)
defs_star = pm.Normal("defs_star", mu=0, sd=sd_def, shape=num_teams)
atts = pm.Deterministic('atts', atts_star - tt.mean(atts_star))
defs = pm.Deterministic('defs', defs_star - tt.mean(defs_star))
home_player0_theta = tt.exp(atts[home_player0] + defs[away_team]
.
.
.
home_player14_theta = tt.exp(atts[home_player14] + defs[away_team]
home_theta = tt.exp(intercept + home + home_player0_theta + ... + home_player14_theta)
home_player0_points = pm.Poisson('home_player0_points', observed=obs_home_player0_points)
home_points = pm.Poisson('home_points', mu=home_theta, observed=observed_home_goals)
Obviously I’ve left some parts of the model out, but would this be at all feasible? The idea/motivation behind it being you could simulate the results of the game in more detail, getting not only a predicted final score but also the likelihood of each player’s contribution. This could potentially allow you to account for changes in roster due to injury or trade.
I believe the exponentiation is indeed what you think. Your second model looks reasonable to me, but I’d need to spend sometime thinking about it to give you a decent answer.
I think there is indeed a mistake (so feel free to do a PR to fix this).
atts_star = pm.Normal(“atts_star”, mu=0, sd=sd_att, shape=num_teams)
should be
atts_star = pm.Normal(“atts_star”, mu=0, sd=sd_att, shape=num_teams, value=att_starting_points.values)
And the defs_star also adjusted.
I was wrong. You don’t need the att_starting_points.values at all. I think there’s a log transformation automatically.
Similarly, I assume the exponentiation function is used in
home_theta = tt.exp(intercept + home + atts[home_team] + defs[away_team])
because of the mentioned log transformation? – Correct.