Hello,
I’m trying to build a simple multinomial logit model in PyMC using pm.Categorical + softmax.
My data is something like distances to different options, but not all options exist for every observation. So when I pivot to a matrix (obs × alternatives), I get NaNs.
Example:
walk = pm.Data(
"walk",
df.pivot(index="obs_id", columns="alt", values="distance").values,
dims=["obs", "alt"]
)
This fails because of NaNs:
Masked arrays or arrays with
nanentries are not supported
My question is:
What is the correct way in PyMC to handle alternatives that are not available for some observations?
Should I fill NaNs with large values (big distance)?
Or is there a proper way to define an availability mask in the model?
I want the model to understand that some alternatives are not part of the choice set, not just “bad”.
Any guidance or examples would be really helpful.