Inference issue using a model with Multinomial latent variables on multidimensional observations

Let’s say that we have access to N observations of a phenomenon where for each observation i we observe a defined number m of independant draws of a Negative Binomial whose paremeters are \alpha_i (Gamma distribution shape parameter) and \mu_i (Gamma distribution mean).

The \alpha parameter is shared across every observations thus \forall i, \alpha_i = \alpha

However, \mu_i depends on a latent variable z_i that assigns every observation i to a hidden state among k possible states.

The latent variables z can be seen as indicator variables and can be modeled using a Multinomial \mathcal{Multinomial}(n=1,p_1,...,p_k) where p_1,...,p_k the probabilities of each one of the different outcomes. In this case z_i is a binary vector pf size k with one 1 on the state where observation i belongs.

Alternatively, the latent variables z can also be modeled using a Categorical distribution with probabilities p_1,...,p_k.

Additionnally, negative binomial mean of each state 1,...,k is known

For example, let’s imagine that observation i belongs to the second state, then z_i =2, and this observation is a vector of m independant draws of a Negative Binomial with parameters \alpha and \mu_j = mean of state 2

Given a bunch of observation following this model, we want to infer \alpha, the latent variable of each observation z and the probabilities of each state p_1,...,p_k.

Observations simulation

N, M, max_state = 10, 5, 8
prob = np.random.rand(max_state)
state_prob = prob/sum(prob) #probabilities of each state
expected_mean = np.concatenate((np.array([1]),np.arange(1,max_state)*10)) #mean of each state
alpha_gt = 0.0001 # shape parameter
with pm.Model() as ind_simulation:
    w = state_prob 
    latent_z = pm.Multinomial('z',n=1, p=w, shape=(N,max_state))
    state_assignment = latent_z.eval() #For debuging purposes we do an evaluation of the latent variabl

    alpha = alpha_gt 
    mu = np.repeat(pm.math.matmul(state_assignment,expected_mean.reshape(-1,1))[:,0][:, np.newaxis], M, axis=1).eval()

    obs_data = pm.NegativeBinomial('ind',alpha=1/alpha, mu=mu) 
    obs_draw = pm.draw(obs_data)

Here is a heatmap of the simulated data, which looks as expected in terms of size, and values. Each value on one line seems to be generated from the same negative binomial distribution.

Sampling

We can now perform the inference

with pm.Model() as inference_model:
    w = pm.Dirichlet("w", a=np.ones(max_state)) 
    latent_z = pm.Multinomial('z',n=1, p=w,shape=(N,max_state))

    alpha = pm.Uniform("alpha", lower=0, upper=0.1) 
    mu = np.repeat(pm.math.matmul(latent_z,expected_mean.reshape(-1,1))[:,0][:, np.newaxis], M, axis=1)
    obs_distrib = pm.NegativeBinomial('obs',alpha=1/alpha, mu=mu, observed=obs_draw)

    obs_sample = pm.sample() 
pm.model_to_graphviz(model=inference_model)

We give the \mu parameters of the Negative Binomial in the most explicit way in the form of a matrix that matches the shape of the observation (N,M) to avoid any ambiguity.

I would have expected to find the z and the observation in a big shared plate of size N and the observation nested in a smaller plate of size M.

Inference results

We can first have a look at the trace


Then, if we look to the posteriors for the latent variables we observe that all the observations were weirdly assigned to the first state

The state probabilities are also off, with the first state having the higher probability

The shape parameter is also reaching the upper bound of the prior, allowing wide distributions

With categorical distribution

I then tried to model the latent variables using categorical distribution resulting in a new inference model

with pm.Model() as inference_model_categorical:
    w = pm.Dirichlet("w", a=np.ones(max_state)) 
    latent_z = pm.Categorical('z', p=w,shape=N)

    alpha = pm.Uniform("alpha", lower=0, upper=0.1) 
    mu = pt.shared(expected_mean)[latent_z]

    obs_distrib = pm.NegativeBinomial('obs',alpha=1/alpha, mu=mu, observed=obs_draw.T)

    obs_sample = pm.sample() 
pm.model_to_graphviz(model=inference_model_categorical)

The model is again not displaying nested plates

However, now the inference works fine


Can someone please help me to understand what is happening here ?

PyMC doesn’t have a step sampler that works for Multinomial distribution whereas you’ll notice there’s a specialized sampler for the Categorical distribution.

No clue about the nesting of the labels. There’s an heuristic used for it. Maybe if you add dims it will figure it out?

Thank you very much for your help
This is the outputs I get when using the categorical model and the multinomial model respectively

Categorical

Multinomial

So if I understand correctly a sampler is chosen to sample from the multinomial but it does not work ? I am not sure I get the issue here. So, using pymc you can only draw from multinomial but not sample from it ? If that is the case then shouldn’t I receive a warning when trying to sample from such a model ?

Also, I can I know from which distribution supported by pymc I can sample from ?

For the nested plates I guess it’s just a displaying issue as it works fine with the model using categorical distributions. Fyi I tried with dims too and still it does not display nested plates.

Also, I wonder if this is the best, most efficient way to index a numpy array storing my negative binomial means of each states (exepected_mean) with the categorical pytensor object ? :

latent_z = pm.Categorical('z', p=w,shape=N)
mu = pt.shared(expected_mean)[latent_z]
pm.NegativeBinomial('obs',alpha=1/alpha, mu=mu, observed=obs_draw.T)

Thank you for your help

In general you can sample any distribution, but sometimes the sampler will fail. This cannot be known in advance, and you will notice it by checking that r_hat is very high across chains or the samples seem “stuck”.

Furthermore, non-NUTS samples (which discrete variables must necessarily use) can fail to sample more quietly than the NUTS sampler (which continuous variables are automatically assigned to if we can get gradients).

The only case I know fails all the time is Multinomial because it’s the only discrete multivariate variable we have, and it has a very specific constraint (all values must add up to n). It would need a sampler that is aware of this fact. We could perhaps raise an error.

Note you can still use Multinomial just fine as a likelihood.

I opened an issue here: Warn or raise when trying to sample a Multinomial variable · Issue #7548 · pymc-devs/pymc · GitHub