Trouble putting together a regression model in pymc3

Hello, I am attempting to build a relatively simple model in pymc3, but the RAM usage is exploding and the model won’t even compile. I think I am clearly doing something wrong, but am unable to understand what exactly. Can someone help point me in the right direction? Thanks!

with Model() as model:
    # Priors
    psi_N = Uniform('psi_N', lower=0, upper=1)
    Binomial_N = Binomial('Binomial_N', n=1, p=psi_N)
    mu_N_noise = StudentT('mu_N_noise', nu=3, mu=0, sd=1)
    mu_N_A = StudentT('mu_N_A', nu=3, mu=0, sd=1, shape=num_A)
    mu_N_RS = StudentT('mu_N_RS', nu=3, mu=0, sd=1)
    mu_N_MM = StudentT('mu_N_MM', nu=3, mu=0, sd=1)
    mu_N_TC = StudentT('mu_N_IP', nu=3, mu=0, sd=1)

    # Primary variables, Frequency (N) and Severity (X)
    N = Binomial_N * tt.exp(mu_N_noise + mu_N_A[AgeBand] + mu_N_RS * RS17 + mu_N_MM * MM + mu_N_TC * Total_Count)

    # Model error
    sigma_y = Normal('sigma_y', mu=0, sd=1e3)

    # Data likelihood
    y_like = Normal('y_like', mu=N, sd=sigma_y, observed=y_train.values)

What’s y_train's shape? One quick option to reduce memory consumption is to sample(cores=1). When using more than one, new processes are forked, data is copied across process and the overall memory consumption scales with the number of cores.

Yeah, this seems like a standard model to make in Pymc3, so probably the data input is just too large. What errors specifically are you getting?

The data is about 50,000 records that I am feeding to this model. Upon running the snippet posted above, the RAM on my computer fills up (8 cores, 64gb RAM), then the RAM usage by Python drops but the RAM seems to still be occupied somewhere, until eventually it errors out (and most of the RAM seems to still be in use, possibly cached or leaked perhaps?). The error, based on the trace:

Traceback (most recent call last):
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py”, line 3296, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File “”, line 18, in
y_like = Normal(‘y_like’, mu=N, sd=sigma_y, observed=y_train.values)
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\distribution.py”, line 42, in new
return model.Var(name, dist, data, total_size)
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\model.py”, line 839, in Var
total_size=total_size, model=self)
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\model.py”, line 1327, in init
self.logp_sum_unscaledt = distribution.logp_sum(data)
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\distribution.py”, line 119, in logp_sum
return tt.sum(self.logp(*args, **kwargs))
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\continuous.py”, line 480, in logp
return bound((-tau * (value - mu)**2 + tt.log(tau / np.pi / 2.)) / 2.,
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\theano\tensor\var.py”, line 147, in sub
return theano.tensor.basic.sub(self, other)
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\theano\gof\op.py”, line 674, in call
required = thunk()
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\theano\gof\op.py”, line 862, in rval
thunk()
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\theano\gof\cc.py”, line 1735, in call
reraise(exc_type, exc_value, exc_trace)
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\six.py”, line 693, in reraise
raise value
MemoryError: None

There are some places that can be eating up a lot of memory during the theano graph construction. You can try changing some flags to disable optimizations that could lower the memory footprint but increase runtime.

I also recommend you take a look at this study of memory allocation both in Python in general but applied in particular to theano.