How would I model individual goals scored as a parameter of total team goals scored in a PyMC3 regression model?

mldl920 · August 2, 2020, 5:08pm

Hi how’s it going?

I’m trying to model individual goals scored, and realized that sometimes my model is returning predictions that wouldn’t be possible in the real world. For example, because Barcelona has so many great players, it may return that in total they will score 6-7 goals in the game if I add up the individual predictions for their team.

Here’s a sample dataset… and bare bones regression model for some context.

df = pd.DataFrame({'team':['Real Madrid','Real Madrid','Real Madrid','Barcelona','Barcelona','Barcelona'],\
                   'team_goals':[3,3,3,4,4,4],\
                  'player':['Karim Benzema','Luka Modric','Sergio Ramos','Lionel Messi','Luis Suárez','Antoine Griezmann'],\
                   'average_player_goal_per_game':[.58,.09,.27,.67,.34,.36],\
                  'player_goals':[2,1,0,1,2,1]})

Screenshot from 2020-08-02 13-08-36

x = df['average_player_goal_per_game']
y = df['player_goals']

with pm.Model() as goals_model:
    a = Normal("a", 0, 1)
    bA = pm.Normal("bA",0, 1)

    sigma = pm.Uniform("sigma", 0,1)
    
    mu = pm.Deterministic("mu", a + bA*x)
    goals = Normal(
        "goals", mu=mu, sigma=sigma, observed=y.values
    )
    trace_goals= pm.sample()

My goal is to make some sort of constraint so that the sum of the output of goals for any given team has a ceiling that is defined by a distribution on the ‘team_goals’ column.

If anyone can help me or point me in the right direction, I’d greatly appreciate it.

DanWeitzenfeld · August 4, 2020, 11:01pm

My first thought is to model team goals, and then use a multinomial model to ‘divvy up’ the team’s goals to the players. Is there a reason you want to go in the other direction, so to speak, modeling the players and using that to estimate the teams?

One reason the sum of the expected player-level goals is biased upward is that teams take their foot off the gas when they are leading (see my_blog_post). You could test this hypothesis by looking at the average goals scored by (e.g.) Ramos conditional on how many goals Benzema has.

mldl920 · August 5, 2020, 6:27pm

No reason at all to go the other way! Thanks for the advice. I’m a bit self taught and new to PyMC3/Bayesian Inference, do you know of any good articles/resources that provide a guide to creating a multinomial hierarchical model that actually divvys up the first level predictions into second level predictions?

AlexAndorra · August 6, 2020, 8:46am

I used a hierarchical multinomial model to predict elections in Paris this year at the district-level. This is of course not a sports model, but maybe you’ll find something interesting and it’ll help you understand concepts

mldl920 · August 6, 2020, 2:43pm

Ok cool thank you so much!

Topic		Replies	Views
Multinomial Softmax Regression, Categorical Predictor in 2d Arrays? v5	2	786	February 23, 2023
Diagonal Inflated Bivariate Poisson v5 modeling	6	714	December 30, 2022
Zero Inflated Poisson model with aggregated data Questions	3	749	October 5, 2020
Model posteriors are stuck at 0. Need help debugging Questions	3	448	March 28, 2021
Model compilation difficulties with medium-size dataset Questions	4	1153	October 4, 2017

How would I model individual goals scored as a parameter of total team goals scored in a PyMC3 regression model?

Related topics