Memory Error with posterior_predictive_sample

For some reason, I’m getting memory errors when running the following code on a simple regression model:

posterior_samples_regression = pm.sample_posterior_predictive(trace, model = model, samples=500)

This has worked in previous models. The full error code and model are below.

Error Code:

100%|██████████| 500/500 [06:02<00:00, 1.10s/it]

MemoryError Traceback (most recent call last)
in
----> 1 posterior_samples_regression = pm.sample_posterior_predictive(trace, model = model, samples=500)

~/anaconda3/lib/python3.7/site-packages/pymc3/sampling.py in sample_posterior_predictive(trace, samples, model, vars, size, random_seed, progressbar)
1144 indices.close()
1145
→ 1146 return {k: np.asarray(v) for k, v in ppc_trace.items()}
1147
1148

~/anaconda3/lib/python3.7/site-packages/pymc3/sampling.py in (.0)
1144 indices.close()
1145
→ 1146 return {k: np.asarray(v) for k, v in ppc_trace.items()}
1147
1148

~/anaconda3/lib/python3.7/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
499
500 “”"
→ 501 return array(a, dtype, copy=False, order=order)
502
503

MemoryError:

Model:

b_input = tt.shared(np.asarray(X_train))

with pm.Model() as model:

#priors
a = pm.Normal('intercept', mu = 0, sd = 5)
beta = pm.Normal('betas', mu = 0, sd = 5)
sd = pm.HalfCauchy('sd', 5)

mu = a + T.dot(beta, b_input)

y = pm.Normal('y', mu = mu, sd = sd, observed = Y_train)

Has anyone else seen this specifically with sampling the posterior?

What are the shapes of train_X and train_y?

It was (40384, 54). I ran it with 100 samples and it worked. Then I ran it with 500 samples again and it worked after failing a few times. A bug maybe?

Sorry. That was X-test. X-train was (127534, 54). Y-train was (127534,1)

Ok, this means that the output of the posterior predictive should be a (500, 40384) array of float64. The current implementation does not preallocate the memory for that, makes a list and then converts it to an array, so it would take up roughly twice that size. That would explain the intermittency. How much RAM do you have?
Overall, I think the implementation could be slightly improved with some preallocation after the first draw.

1 Like

32 gigs

The full array should just take up 25Gb, so it should fit. It doesn’t because of the preallocation issue, where you end up using almost twice the memory. It should be easy to add some preallocation logic after making the very first draw (which let’s you infer the outputs shape). Would you be interested in making a PR?

I apologize but I’m not sure what a ‘PR’ is.

It is an acronym for pull request. It is a term used in git to make a contribution to the pymc3 code base. If you feel adventurous, you can try to read some tutorials online on how to use git and github. If not, I’ll try to make one when I get the chance

Would you want me to push my model code up and error?

There’s no need, thanks