Multivariate categorical with different probabilities

chartl · April 8, 2019, 9:16pm

What’s the most efficient way to move from a single-sample D-C distribution:

alpha = [1.] * 5
with pm.Model() as model:
    dprior = pm.Dirichlet('dir', np.array(alpha), shape=(5,))
    picks = pm.Categorical('pick', dprior)
    spp = pm.sample_prior_predictive(50)

to a multivariate version:

alpha = [1.] * 5
k = 3
with pm.Model() as model:
    dprior = pm.Dirichlet('dir', np.array(alpha), shape=(k, 5))
    # picks = pm.Categorical('pick', dprior) # DOES NOT WORK
    # picks = pm.Categorical('pick', dprior, shape=(k,)) # DOES NOT WORK
    # picks = pm.Categorical('pick', dprior, shape=(5,)) # DOES NOT WORK
    # picks = pm.Categorical('pick', dprior, shape=(5,k)) # DOES NOT WORK (but unique error)
    # picks = pm.Categorical('pick', dprior, shape=(k,5)) # DOES NOT WORK
    spp = pm.sample_prior_predictive(50)

What’s the trick here?

The unique error looks like a collision between shape and size kwargs:

~/anaconda3/lib/python3.7/site-packages/pymc3/distributions/dist_math.py in random_choice(*args, **kwargs)
    320     """
    321     p = kwargs.pop('p')
--> 322     size = kwargs.pop('size')
    323     k = p.shape[-1]
    324

junpenglao · April 9, 2019, 5:14am

Hmm, it is likely a shape error of the random generation in Dirichlet. @lucianopaz could you have a look?
Also something like LDA implementation with pymc3 used to work, but not prior predictive is never tested

lucianopaz · April 9, 2019, 5:47am

Works for me on the master branch.

Should work in principle but doesn’t. I’ll look into the problem a bit more

chartl · April 9, 2019, 6:21am

Not for me:

import pymc3 as pm
import numpy as np
alpha = [1.] * 5
k = 3
with pm.Model() as model:
    dprior = pm.Dirichlet('dir', np.array(alpha), shape=(k, 5))
    picks = pm.Categorical('pick', dprior, shape=(k,)) # DOES NOT WORK
    spp = pm.sample_prior_predictive(50)

[...]
~/anaconda3/lib/python3.7/site-packages/pymc3/distributions/dist_math.py in <listcomp>(.0)
    325     if p.ndim > 1:
    326         # If a 2d vector of probabilities is passed return a sample for each row of categorical probability
--> 327         samples = np.array([np.random.choice(k, p=p_) for p_ in p])
    328     else:
    329         samples = np.random.choice(k, p=p, size=size)

mtrand.pyx in mtrand.RandomState.choice()

ValueError: object too deep for desired array

this is with a fresh git pull (6be2b30c) and pip setup.py install. Running OSX and python 3.7.1 (anaconda)

lucianopaz · April 9, 2019, 6:54am

Maybe pip is not performing the install because we don’t change pymc3’s version number in between releases. That means that from pip’s perspective any pulled commit that happened after 3.6 and before 3.7, are all seen like the same 3.6 version. That makes pip think that there is no need to update the package and then it does not install anything. You should uninstall pymc3, and then pip install -e setup.py your pulled repo. That way, you’ll always be sure that any change at the repo will impact the loaded pymc3 package.

chartl · April 9, 2019, 6:59am

Wow! Crazy that pip would ignore changes even with the install -e syntax. I can confirm, after uninstalling and re-installing, that

pm.Categorical('pick', dprior, shape=(k,))

works! Thanks

Topic		Replies	Views
pm.Categorical for matrix of probabilities Questions	3	574	February 22, 2019
Multivariate categorical with observed data	9	1354	April 26, 2023
Dirichlet and Categorical Questions	2	605	October 26, 2020
Predicting with Categorical Questions	5	2009	October 29, 2019
Multivariatre categorical variable with different values Questions	3	2332	August 7, 2018

Multivariate categorical with different probabilities

Related topics