Out of memory for simple Categorical model


I am trying to estimate the probabilities of a categorical distribution. My data is a numpy array (I actually have 3 categories, encoded as 0,1,2):

array([1, 1, 1, ..., 0, 0, 1], dtype=int32)

The shape of the array is:


My model definition is as follows (I have also tried with a Dirichlet prior, no difference)

with pm.Model() as categ_model:
  theta = pm.Uniform('theta', 0,1, shape=3)
  obs = pm.Categorical(name='obs', p=theta, observed=X)

I get out of memory errors during sampling, if my tuning steps exceed 1000. I have also had a look at this other post but the solution mentioned there (reshaping X so it becomes (233310,1) ) does not work for me.
What am I doing wrong?

Thank you

What version of PyMC are you using? In the latest version (from the repository) I can sample just fine, each process seems to consume around 300 MB of RAM

import numpy as np
import pymc as pm

data = np.random.randint(3, size=233_310, dtype="int32")

with pm.Model() as m:
    theta = pm.Uniform("theta", 0, 1, shape=3)
    obs = pm.Categorical("obs", p=theta, observed=data)
with m:
    trace = pm.sample()
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [theta]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 89 seconds.
1 Like

@ricardoV94 my pymc3 version is 3.11.5 and the python version is 3.8.10.
I have installed pymc3 using pip.