Really good suggestion, makes a lot of sense thank you! I ended up getting a better fit by changing to a GRW, as well as switching from a multivariate to just a no pooling Gaussian random walk and adding an AR1 term. Here’s the model
with pm.Model() as m2c:
# latent variables
team_bias = pm.GaussianRandomWalk('team_bias', sigma=1, shape=(len(np.unique(t)), len(team_dct)), )
mu_form = pm.Normal('mu_form', 0,0.5)
team_form = pm.AR1('team_form', k=mu_form, tau_e=0.1, shape=(len(np.unique(t)), len(team_dct)), )
# home team advantage
mu_home = pm.Normal('mu_home', 0.3, 0.4)
home = pm.Normal('home', mu_home,0.5, shape=len(team_dct))
# prior
sigma_y = pm.HalfNormal('sigma_y', 0.1) # match variance
ability = pm.Deterministic('ability', team_bias + team_form )
mu_match = home[team1_] + ability[t, team1_] - ability[t, team2_]
y = pm.Normal('y', mu=mu_match, sd=sigma_y, observed = result)
trace2c = pm.sample(1000, tune=1000, return_inferencedata=True)
Two issues I’m running into:
- I can’t figure out why having both a RW and an AR1 process lead to the best model. Is that weird theoretically?
- Is there a clever way I can add a prior that limits extreme match results and add more weight to predictions around 0? Example of why my current model is problematic below
I don’t think it makes much sense that the probability of Milan scoring 4+ more goals over Parma is the same probability as them drawing or losing.
