How to implement Zero-Inflated Multinomial?

Just realized I can mark this as solved thanks to @junpenglao’s suggestion of considering that non-available categories have probability 0 – this implies we know in advance which categories are absent. Of course, I’m curious about @aseyboldt thoughts on profiling and speed-ups if he’s got any!
Here is a simplified version of the model, for the sake of readability and understanding how to implement the idea with PyMC:

with pm.Model() as m_multi:
    a_cluster = pm.Normal("a_cluster", -1.8, 0.5, shape=(Nclusters, Ncategories - 1))
    a_pivot = tt.as_tensor_variable(np.full(shape=(Nclusters, 1), fill_value=-2.2))
    a_cluster_f = tt.horizontal_stack(a_cluster a_pivot)

    lat_p = tt.nnet.softmax(a_cluster_f[cluster_id])

    # zero-inflation process:
    # keep only preferences for available categories:
    slot_p = cats_available * lat_p
    # normalize preferences:
    slot_p = slot_p / tt.sum(slot_p, axis=1, keepdims=True)
    R = pm.Multinomial("R", n=N, p=slot_p, observed=sim_R)

    trace_multi = pm.sample(2000, tune=3000, init="adapt_diag")

Hope this will help future users!
For the curious out there, the real model is much more complicated (varying-effects and covariance) but I’ll open source it – as well as real and simulated data – once I’ve had time to clean it up!
And again, a big thank you to Junpeng :champagne: Couldn’t have done it without your help! Now I owe you a second beer when you come to Paris :wink: