# Multivariate categorical with different probabilities

What’s the most efficient way to move from a single-sample D-C distribution:

``````alpha = [1.] * 5
with pm.Model() as model:
dprior = pm.Dirichlet('dir', np.array(alpha), shape=(5,))
picks = pm.Categorical('pick', dprior)
spp = pm.sample_prior_predictive(50)

``````

to a multivariate version:

``````alpha = [1.] * 5
k = 3
with pm.Model() as model:
dprior = pm.Dirichlet('dir', np.array(alpha), shape=(k, 5))
# picks = pm.Categorical('pick', dprior) # DOES NOT WORK
# picks = pm.Categorical('pick', dprior, shape=(k,)) # DOES NOT WORK
# picks = pm.Categorical('pick', dprior, shape=(5,)) # DOES NOT WORK
# picks = pm.Categorical('pick', dprior, shape=(5,k)) # DOES NOT WORK (but unique error)
# picks = pm.Categorical('pick', dprior, shape=(k,5)) # DOES NOT WORK
spp = pm.sample_prior_predictive(50)

``````

What’s the trick here?

The unique error looks like a collision between `shape` and `size` kwargs:

``````~/anaconda3/lib/python3.7/site-packages/pymc3/distributions/dist_math.py in random_choice(*args, **kwargs)
320     """
321     p = kwargs.pop('p')
--> 322     size = kwargs.pop('size')
323     k = p.shape[-1]
324
``````

Hmm, it is likely a shape error of the random generation in Dirichlet. @lucianopaz could you have a look?
Also something like LDA implementation with pymc3 used to work, but not prior predictive is never tested

Works for me on the master branch.

Should work in principle but doesn’t. I’ll look into the problem a bit more

1 Like

Not for me:

``````import pymc3 as pm
import numpy as np
alpha = [1.] * 5
k = 3
with pm.Model() as model:
dprior = pm.Dirichlet('dir', np.array(alpha), shape=(k, 5))
picks = pm.Categorical('pick', dprior, shape=(k,)) # DOES NOT WORK
spp = pm.sample_prior_predictive(50)

[...]
~/anaconda3/lib/python3.7/site-packages/pymc3/distributions/dist_math.py in <listcomp>(.0)
325     if p.ndim > 1:
326         # If a 2d vector of probabilities is passed return a sample for each row of categorical probability
--> 327         samples = np.array([np.random.choice(k, p=p_) for p_ in p])
328     else:
329         samples = np.random.choice(k, p=p, size=size)

mtrand.pyx in mtrand.RandomState.choice()

ValueError: object too deep for desired array
``````

this is with a fresh `git pull` (`6be2b30c`) and `pip setup.py install`. Running OSX and python 3.7.1 (anaconda)

Maybe pip is not performing the install because we don’t change pymc3’s version number in between releases. That means that from pip’s perspective any pulled commit that happened after 3.6 and before 3.7, are all seen like the same 3.6 version. That makes pip think that there is no need to update the package and then it does not install anything. You should uninstall pymc3, and then `pip install -e setup.py` your pulled repo. That way, you’ll always be sure that any change at the repo will impact the loaded pymc3 package.

1 Like

Wow! Crazy that pip would ignore changes even with the `install -e` syntax. I can confirm, after uninstalling and re-installing, that

`pm.Categorical('pick', dprior, shape=(k,))`

works! Thanks