I was wondering how one might go about improving the model if additional data were available in the Rugby dataset.
https://docs.pymc.io/en/v3/pymc-examples/examples/case_studies/rugby_analytics.html
For example, if the original dataset had an extra column containing the difference in team quality. How could this be added to the model as a global parameters?
I have tried the following updates:
with pm.Model(coords=coords) as model:
# constant data
home_team = pm.Data("home_team", home_idx, dims="match")
away_team = pm.Data("away_team", away_idx, dims="match")
# global model parameters
home = pm.Normal("home", mu=0, sigma=1)
sd_att = pm.HalfNormal("sd_att", sigma=2)
sd_def = pm.HalfNormal("sd_def", sigma=2)
intercept = pm.Normal("intercept", mu=3, sigma=1)
team_quality = pm.Normal("team_quality", mu=0, sigma=2)
# team-specific model parameters
atts_star = pm.Normal("atts_star", mu=0, sigma=sd_att, dims="team")
defs_star = pm.Normal("defs_star", mu=0, sigma=sd_def, dims="team")
atts = pm.Deterministic("atts", atts_star - tt.mean(atts_star), dims="team")
defs = pm.Deterministic("defs", defs_star - tt.mean(defs_star), dims="team")
team_qual = pm.Deterministic("quals", team_quality * all_df["team_quality"].values, dims="team")
home_theta = tt.exp(intercept + home + atts[home_idx] + defs[away_idx] + team_qual[home_idx])
away_theta = tt.exp(intercept + atts[away_idx] + defs[home_idx] + team_qual[away_idx])
# likelihood of observed data
home_points = pm.Poisson(
"home_points",
mu=home_theta,
observed=df_all["home_score"],
dims=("match"),
)
away_points = pm.Poisson(
"away_points",
mu=away_theta,
observed=df_all["away_score"],
dims=("match"),
)
This code begins sampling before returning “RuntimeError: Chain 3 failed.”.
My question is how does one correctly incorporate an additional variable directly from the data?