Hi how’s it going?
I’m trying to model individual goals scored, and realized that sometimes my model is returning predictions that wouldn’t be possible in the real world. For example, because Barcelona has so many great players, it may return that in total they will score 6-7 goals in the game if I add up the individual predictions for their team.
Here’s a sample dataset… and bare bones regression model for some context.
df = pd.DataFrame({'team':['Real Madrid','Real Madrid','Real Madrid','Barcelona','Barcelona','Barcelona'],\
'team_goals':[3,3,3,4,4,4],\
'player':['Karim Benzema','Luka Modric','Sergio Ramos','Lionel Messi','Luis Suárez','Antoine Griezmann'],\
'average_player_goal_per_game':[.58,.09,.27,.67,.34,.36],\
'player_goals':[2,1,0,1,2,1]})
x = df['average_player_goal_per_game']
y = df['player_goals']
with pm.Model() as goals_model:
a = Normal("a", 0, 1)
bA = pm.Normal("bA",0, 1)
sigma = pm.Uniform("sigma", 0,1)
mu = pm.Deterministic("mu", a + bA*x)
goals = Normal(
"goals", mu=mu, sigma=sigma, observed=y.values
)
trace_goals= pm.sample()
My goal is to make some sort of constraint so that the sum of the output of goals for any given team has a ceiling that is defined by a distribution on the ‘team_goals’ column.
If anyone can help me or point me in the right direction, I’d greatly appreciate it.