Multinomial Softmax Regression, Categorical Predictor in 2d Arrays?

Goodmanngl · February 23, 2023, 3:27pm

Hello,

I’m having some issues with dimensionality in a multinomial softmax regression I have implemented for who will score a given goal in a football team.

I have successfully implemented the model with several numerical predictors (e.g. number of minutes played, previous average form), however I am having difficulties with categorical predictors, specifically the position each player plays.

The observations (goals scored) have the shape of number of (matches x number of players)

e.g. [[0,0,0,1,2,0,0],[0,0,0,1,0,0,0],[0,2,0,1,2,0,0]]

where each row is a match, and each position in the array corresponds to the number of goals that player scored.

The predictor data is in the same format:
e.g. [[GK,FW,MF,DF,MF,FW,DF],[GK,FW,FW,DF,MF,FW,DF],[GK,FW,MF,MF,MF,FW,MF]] (also numerically encoded with an index in a separate array)

Where each row is a match and each position corresponds to the player’s position in that match.

Previously I have set this up by creating separate boolean arrays for each position (e.g. Midfield array, where position is MF 1, else 0) and used the following model in pymc successfully:

with pm.Model() as pos_model:

    delta_fw = pm.Normal('delta_fw')
    delta_dc = pm.Normal('delta_dc')
    delta_sub= pm.Normal('delta_sub')
    delta_amc = pm.Normal('delta_amc')
    delta_fb = pm.Normal('delta_fb')
    delta_dm= pm.Normal('delta_dm')
    delta_mid= pm.Normal('delta_mid')

    mu_xg = pm.math.dot(fw_arr,delta_fw) + pm.math.dot(amc_arr,delta_amc)+ pm.math.dot(fb_arr,delta_fb)+ pm.math.dot(dm_arr,delta_dm)+ pm.math.dot(mid_arr,delta_mid)+ pm.math.dot(dc_arr,delta_dc)

    p_xg = pm.Deterministic('p_xg', pm.math.softmax(mu_xg, axis = 1))

    counts_xg = pm.Multinomial("counts_xg", n=ttens, p=p_xg, shape=(n, k), observed=gls_scored_arr)

    trace = pm.sample(4000,chains=2)

However as I need to increase the complexity of the model, I would like to learn how to use coords and dims to do this without having to split it out into different arrays for each position.

E.g. delta = pm.Normal(‘delta’, shape = (nclass))

I’ve taken a look through the discourse and can’t seem to find anything similar that works, and also read up as much as possible on coords and dims in pymc to no avail… so any help on how I can do this would be greatly appreciated!

ricardoV94 · February 23, 2023, 5:58pm

Coords and dims as just labels for your shape, so they can’t solve any shape by themselves (although it often makes it easier to reason about).

You may want to see if bambi could help you create the right regression model. They have support for one kind of multinomial regression: Add multinomial family by tomicapretto · Pull Request #490 · bambinos/bambi · GitHub

Goodmanngl · February 23, 2023, 7:54pm

Thanks Ricardo for the quick reply. You’re right, I probably got too fixated on the cords and dims there, but the problem I really have is with the shape.

Thanks for the link to the Bambi multinomial model, I’m not sure if Bambi will be able to work long term for me, as I’ve integrated some masking into my pymc model (e.g. if a player plays 0 minutes, set to -inf) before the softmax, which I imagine I can’t do so easily in Bambi. But the pymc mode in the same link is also some food for thought!

I guess what I’m really after is suggestions to simplify the model above (instead of the separate arrays for each position category, use the original position array with all positions ), and declare a single “delta” with the dims being the number of different positions. I’ve tried this, but can’t get it to work so far… any thoughts about how I might go about this?

Topic		Replies	Views
How would I model individual goals scored as a parameter of total team goals scored in a PyMC3 regression model? Questions	4	506	August 6, 2020
Why my multinomial model with categorical predictors and response differs so much from results in R? version agnostic modeling	5	98	January 26, 2025
Issue with running prediction with multinomial softmax v5	3	556	November 2, 2022
Compute predictions for multinomial categorical model modeling	0	318	May 16, 2023
Categorical predictor variable, categorical response variable Questions	3	1410	February 5, 2025

Multinomial Softmax Regression, Categorical Predictor in 2d Arrays?

Related topics