Multivariate categorical with observed data

I think my question is similar to this older question, however, I am also including observed data which I believe changes the solution.
I am trying to instantiate an observed multivariate categorical with a Dirichlet prior. I would like to assign different probabilities to each variable in the categorical; however, the below code fails with the error AssertionError: Could not broadcast dimensions

import pymc as pm
import numpy as np

with pm.Model() as model:
    observed = [[0,1,1],[1,0,1]]
    prior_probs = np.ones(shape=(2,2))
    prior = pm.Dirichlet("prior", a=prior_probs)
    posterior = pm.Categorical("posterior", prior, observed=observed)

Intuitively I thought it should work as I am providing observed data of shape (2,3) and the prior probability is of shape (2,#categories) in this case (2,2)

This may be of help: Distribution Dimensionality — PyMC 5.3.0 documentation

Thanks! There was an error in the first version of the question that I just edited - I had the “shape” of the prior_probs variable as (1,2), but it should be (2,2) as is shown now

You probably need to transpose the categorical observations. Batched dimensions have to be on the left. I assume you have 3 pairs of 2 categorical variables?

If your Dirichlet prior varied across observations, it would have to have shape (3, 2, 2), in case that helps

I’m not sure I understand; in the example above I have two observed categorical variables of size 3. I don’t need the Dirichlet prior to vary across observations, I want to set a single prior per variable.
I thought that the batched dimension of 2 should be on the left in the above example:
observed data dimension (2,3)
prior dimension (2,2)
I feel like I’m missing something very obvious but not sure what it is

As a side note, I can also parameterize the Dirichlet by setting the shape directly, which I thought should lead to a variable “prior” of shape (2,2)

with pm.Model() as model:
    observed = [[0,1,1],[1,0,1]]
    prior = pm.Dirichlet("prior", a=np.ones(2), shape=(2,))
    posterior = pm.Categorical("posterior", prior, observed=observed)
    trace = pm.sample()

This code runs and does not throw an error.
However, if I look in the trace object, I see that the prior variable is actually of shape (1,2), not (2,2)

In the simplest case of a categorical you hape a vector of probabilities for a scalar output categorical(p=[0.5 0.5]) which can yield a single 0 or 1. Instead categorical(p=[0.3, 0.3, 0.4]) can yield a single 0, 1, or 2.

The last axis of p gives you the event probabilities. If you have a (3, 2) p matrix you get 3 scalars in {0, 1}, if you have a (2, 3) p matrix you get 2 scalars in {0, 1, 2}. Does that help?

Hi Ricardo,
Yes, this matches my understanding as well. Thanks for your help!
I was able to achieve the desired behavior by adding an extra dimension to the dirichlet probability matrix. Specifically, I altered the code with this line:
prior_probs = pt.expand_dims(np.ones(shape=(2,2)),1)
to add an extra (flat) dimension in between the variables dimension and the categories dimension.
The approach allows the code to run and produce the desired results. The only downside is that the posterior samples for the “prior” variable also have the extra dimension, but I can work around that.

You can expand the dims after you define the Dirichlet and then it won’t show up in the trace.

1 Like

Thanks! Yep that worked to expand the dims without having it show up in the trace