Handling missing alternatives (NaNs) in multinomial logit with PyMC?

Hello,

I’m trying to build a simple multinomial logit model in PyMC using pm.Categorical + softmax.

My data is something like distances to different options, but not all options exist for every observation. So when I pivot to a matrix (obs × alternatives), I get NaNs.

Example:

walk = pm.Data(
    "walk",
    df.pivot(index="obs_id", columns="alt", values="distance").values,
    dims=["obs", "alt"]
)

This fails because of NaNs:

Masked arrays or arrays with nan entries are not supported

My question is:

What is the correct way in PyMC to handle alternatives that are not available for some observations?

Should I fill NaNs with large values (big distance)?

Or is there a proper way to define an availability mask in the model?

I want the model to understand that some alternatives are not part of the choice set, not just “bad”.

Any guidance or examples would be really helpful.

You can set the respective probabilities to -inf, which after the softmax are 0. As long as you don’t have any rows where all entries are impossible, it should be fine.

2 Likes

Really appreciate it.

Note also that @ricardoV94’s solution respects Luce’s choice axiom:

3 Likes