Multivariate categorical with different probabilities

#1

What’s the most efficient way to move from a single-sample D-C distribution:

alpha = [1.] * 5
with pm.Model() as model:
    dprior = pm.Dirichlet('dir', np.array(alpha), shape=(5,))
    picks = pm.Categorical('pick', dprior)
    spp = pm.sample_prior_predictive(50)

to a multivariate version:

alpha = [1.] * 5
k = 3
with pm.Model() as model:
    dprior = pm.Dirichlet('dir', np.array(alpha), shape=(k, 5))
    # picks = pm.Categorical('pick', dprior) # DOES NOT WORK
    # picks = pm.Categorical('pick', dprior, shape=(k,)) # DOES NOT WORK
    # picks = pm.Categorical('pick', dprior, shape=(5,)) # DOES NOT WORK
    # picks = pm.Categorical('pick', dprior, shape=(5,k)) # DOES NOT WORK (but unique error)
    # picks = pm.Categorical('pick', dprior, shape=(k,5)) # DOES NOT WORK
    spp = pm.sample_prior_predictive(50)

What’s the trick here?


The unique error looks like a collision between shape and size kwargs:

~/anaconda3/lib/python3.7/site-packages/pymc3/distributions/dist_math.py in random_choice(*args, **kwargs)
    320     """
    321     p = kwargs.pop('p')
--> 322     size = kwargs.pop('size')
    323     k = p.shape[-1]
    324 
0 Likes

#2

Hmm, it is likely a shape error of the random generation in Dirichlet. @lucianopaz could you have a look?
Also something like LDA implementation with pymc3 used to work, but not prior predictive is never tested

0 Likes

#3

Works for me on the master branch.

Should work in principle but doesn’t. I’ll look into the problem a bit more

1 Like

#4

Not for me:

import pymc3 as pm
import numpy as np
alpha = [1.] * 5
k = 3
with pm.Model() as model:
    dprior = pm.Dirichlet('dir', np.array(alpha), shape=(k, 5))
    picks = pm.Categorical('pick', dprior, shape=(k,)) # DOES NOT WORK
    spp = pm.sample_prior_predictive(50)

[...]
~/anaconda3/lib/python3.7/site-packages/pymc3/distributions/dist_math.py in <listcomp>(.0)
    325     if p.ndim > 1:
    326         # If a 2d vector of probabilities is passed return a sample for each row of categorical probability
--> 327         samples = np.array([np.random.choice(k, p=p_) for p_ in p])
    328     else:
    329         samples = np.random.choice(k, p=p, size=size)

mtrand.pyx in mtrand.RandomState.choice()

ValueError: object too deep for desired array

this is with a fresh git pull (6be2b30c) and pip setup.py install. Running OSX and python 3.7.1 (anaconda)

0 Likes

#5

Maybe pip is not performing the install because we don’t change pymc3’s version number in between releases. That means that from pip’s perspective any pulled commit that happened after 3.6 and before 3.7, are all seen like the same 3.6 version. That makes pip think that there is no need to update the package and then it does not install anything. You should uninstall pymc3, and then pip install -e setup.py your pulled repo. That way, you’ll always be sure that any change at the repo will impact the loaded pymc3 package.

1 Like

#6

Wow! Crazy that pip would ignore changes even with the install -e syntax. I can confirm, after uninstalling and re-installing, that

pm.Categorical('pick', dprior, shape=(k,))

works! Thanks

0 Likes