Hello,
I want to model the amount of goals scored by a player in a football (American soccer) match using zero inflated Poisson model. The problem is that I can not just model the probability of playing in a specific match because player may be playing different amount of minutes each match. So I was thinking about modelling p
as the probability of playing in a given minute.
Here’s how my data looks like
| minutes_played | goals |
| 56 | 1 |
| 90 | 2 |
| 0 | 0 |
...
So here is how I try to model it:
with pm.Model() as zip_model:
a_mu = pm.Normal('au', 0, 1)
a_sigma = pm.Exponential('a_sigma', 1)
ap = pm.Normal('ap', a_mu, a_sigma)
p = pm.math.invlogit(ap)
y = pm.Binomial('y', p = p, n = 90, observed = data.minutes_played) #Probability of playing in a given minute
al = pm.Normal('al', 0, 1)
lambda_ = pm.math.exp(al)
goals = pm.ZeroInflatedPoisson('goals', psi = pm.math.invlogit(y), theta = lambda_, observed = data.goals)
trace = pm.sample(draws = 4000, tune = 1000)
Problem is I get a warning for effective samples lower than 200 for some parameters (how bad is that?) and that for after tuning the were ~450 divergences…
Am I missing something with the model?
because the simulation results seems a bit off.
sim_goals = np.random.binomial(n = 1,p = logistic(trace['ap'])).reshape(-1,1)*np.random.poisson(np.exp(trace['al']))