Diagonal Inflated Bivariate Poisson

Hello,
I am trying to predict football scores (thrilling, I know :smiley:) for my studies.
There is quite a bit of literature regarding this field and everyone is using Poisson regression (dependent or independent) to predict the goals for the home and the away team and calculating the results from this distributions. So far, so easy.

My problem right now is that the model doesn’t predict any draws, meaning P(goals\_home == goals\_away) is never higher than P(goals\_home > goals\_away) or P(goals\_home < goals\_away). This problem isn’t unique to my model, but it seems more distinct, as in roughly 600 predicted test matches to model produces 0 draw predictions.

There are a few proposed solutions, but most of the authors seem to agree that the modeling of the goals via Poisson is the solution (sadly not for me :frowning: ). Another proposed solution is the modeling via a Skellam distribution (Poisson difference distribution) and to tackle the few draws, use a zero inflated Skellam distribution. This sounds logically to me, unfortunately Pymc doesn’t offer the Skellam distribution and I doubt I am capable to implement it myself (If even possible).

This is my model right now, please feel free to criticize everything, even if it is not directly correlated to the draw problem at hand.

with pm.Model() as independent_poisson:
   pm_features = pm.Data("pm_features", features, mutable=True)
   pm_gi = pm.Data("pm_gi", gi, mutable=True)
   pm_form = pm.Data("pm_form", form, mutable=True)
   pm_goals = pm.Data("pm_goals", goals, mutable=True)
   pm_elo = pm.Data("pm_elo", elo, mutable=True)

   coefs_features = pm.Normal('coefs_features', mu=[[0.5, 0], [0.5, 0], [0, 0.5], [0, 0.5]], 
                              sigma=[[2, 0.001], [2, 0.001], [0.001, 2], [0.001, 2]], 
                              shape=(features.shape[1], 2))
   coefs_elo = pm.Normal('coefs_elo', mu=[1,1], sigma=[0.5, 0.5], shape=(1,2))
   coefs_form = pm.Normal('coefs_form', mu=[[1, -0.5], [-0.5, 1]], 
                         sigma=[[0.5, 0.05],[0.05, 0.5]], shape=(form.shape[1],2))
   
   factor = pm.Dirichlet("factor", a=np.ones(3))
  
   intercepts = pm.Normal("intercepts", shape=2)

   log_lam = pm.Deterministic("log_lam", intercepts + factor[0]*(pm_elo @ coefs_elo) + 
                                                      factor[1]*(pm_form @ coefs_form) + 
                                                      factor[2]*(pm_features @ coefs_features))
                           
   lam = pm.math.exp(log_lam)

   obs = pm.Poisson("obs", mu=lam, observed=pm_goals)

I also tried a dependent Poisson distribution by using pm.MvNormal but without more success.
So my general question might be: Is there a way to produce thetas that are closer together (maybe dont use exp?), or other ways, like the zero inflated Skellam, to produce more draws?

Thank you very much in advance for your help!

1 Like

Is it possible to see these predictions? That might help see where/how the problem is arising. Also, @Martin_Ingram might have some insights as he has done prior work this sort of model.

2 Likes

Hey both, this is an interesting problem. I agree with @cluhmann that it would be interesting to be able to actually run this model. Would you be willing to share your input data? This would be useful to try, so that we can distinguish whether the problem is actually with the Poisson distribution, or with the way the current model is implemented.

Just naively, I’m a little surprised that the most likely outcome would never be a draw, even with a Poisson likelihood. Imagine e.g. that both teams are just exceptionally low scoring; in that case, I would have thought that the Poissons would both have a mode at zero, which should make a draw most likely. But it’s possible I’m missing something – I’ve worked with sports data a lot, but not a lot of soccer data, so I am not sure about the limitations of the Poisson distribution.

2 Likes

Hey guys, thanks for your answers.
Of course, I attached training and test data. I also changed the model to include the data loading parts.
There is still a lot of uncertainty which data to use and how, and I am testing quite a bit, but I think regardless the model should be able to predict some draws, even if it’s the wrong prediction :sweat_smile:.

I looked some more into the model’s predictions, and I am actually surprised that they look quite well. But therefore even more irritated that the model doesn’t predict draws. The following figure shows the actual goal distribution for the home and away team respectively (blue) and the predicted one (red). Quite good, I think.
image

Also I plotted the “actual”, as in observed, probability of match results and the one the model predicts. (Cell 0,0 represents a 0:0 draw, cell 1,0 a 1:0 home win aso) Again, the model is not spot on but quite close.

After these, I am even more clueless than before why my model doesn’t like draws :D.
And just to clarify again, it’s not that the probability of a draw is zero, but the probability of a home win or an away win is always higher.


Also, I did some more research and found this paper which describes essentially the problem and a solution. Unfortunately, I am uncertain if it is possible to implement in with pymc.

We propose a more general model formulation which inflates the probabilities in the diagonal of the probability table

The mentioned probability table corresponds to the table 2 in figure 2. And can be calculated by P(g_1 | g_1 \in Poisson(\lambda_1)) \times P(g_2|g_2 \in Poisson(\lambda_2))^T (I hope this formulation is at least somewhat logical)

In this table, they inflate the probabilities of the diagonal, corresponding to draws because g_1 = g_2.

Is this behavior possible with pymc?

Thank you for your answers and your help!

data_game_values_train.csv (5.7 MB)
data_game_values_test.csv (1.7 MB)

Ok, I wanted to edit / change my model in the first post, but somehow I’m unable to.
So, here is the model with the data loading as of right now:

features = np.swapaxes(np.array([train_data["home_xG"] - np.mean(train_data["home_xG"]), train_data["away_xg_against"] - np.mean(train_data["away_xg_against"]),
                                 train_data["away_xG"] - np.mean(train_data["away_xG"]), train_data["home_xg_against"] - np.mean(train_data["home_xg_against"])]), 0, 1)

form_diff = np.swapaxes(np.array([(train_data["form_home"] / 15) - (train_data["form_away"] / 15)]), 0, 1)

goals = np.swapaxes(np.array([train_data["home_score"], train_data["away_score"]]), 0, 1)

elo_diff = np.swapaxes(np.array([(train_data["elo_home"] / 1000) - (train_data["elo_away"] / 1000)]), 0, 1)
                             
                            
with pm.Model() as independent_poisson:
    pm_features = pm.Data("pm_features", features, mutable=True)
    pm_form_diff = pm.Data("pm_form_diff", form_diff, mutable=True)
    pm_goals = pm.Data("pm_goals", goals, mutable=True)
    pm_elo_diff = pm.Data("pm_elo_diff", elo_diff, mutable=True)

    coefs_features = pm.Normal('coefs_features', mu=[[0.5, 0], [0.5, 0], [0, 0.5], [0, 0.5]], 
                                                 sigma=[[2, 0.001], [2, 0.001], [0.001, 2], [0.001, 2]], shape=(features.shape[1], 2))

    coefs_elo_diff = pm.Normal('coefs_elo_diff',mu=[0.5, -0.5], shape=(1,2))
    coefs_gi = pm.Normal('coefs_gi',mu=[[0.5, -0.5], [-0.5, 0.5]], sigma=0.5, shape=(2,2))

    coefs_form_diff = pm.Normal('coefs_form_diff',mu=[0.5, -0.5], shape=(form_diff.shape[1],2))
    factor = pm.Dirichlet("factor", a=np.ones(3))
    home_advantage = pm.HalfNormal("home_advantage",sigma=[1, 0.001], shape=2)
    intercepts = pm.Normal("intercepts", shape=2)


    log_lam = pm.Deterministic("log_lam", intercepts + home_advantage + 
                                                       factor[0]*(pm_elo_diff @ coefs_elo_diff) + 
                                                       factor[1]*(pm_form_diff @ coefs_form_diff) + 
                                                       factor[2]*(pm_features @ coefs_features))                        

    lam = pm.math.exp(log_lam)

    obs = pm.Poisson("obs", mu=lam, observed=pm_goals)

Also I’m not really sure if I should use Dirichlet here, as it sometimes just removes some factors from the equation by making them really small.

I agree. However, I am not seeing where the model “doesn’t like draws”. Looking at the matrix of predictions, the diagonal looks pretty close to the diagonal of the empirical matrix. So what are you seeing that you think is “wrong”?

The diagonal for each specific game is never the max, so the draw is never the most likely.
But I guess it’s good enough like it is.
Thanks for your help :slight_smile: