Memory Error with posterior_predictive_sample

jordan.howell2 · March 10, 2019, 4:03pm

For some reason, I’m getting memory errors when running the following code on a simple regression model:

posterior_samples_regression = pm.sample_posterior_predictive(trace, model = model, samples=500)

This has worked in previous models. The full error code and model are below.

Error Code:

100%|██████████| 500/500 [06:02<00:00, 1.10s/it]

MemoryError Traceback (most recent call last)
in
----> 1 posterior_samples_regression = pm.sample_posterior_predictive(trace, model = model, samples=500)

~/anaconda3/lib/python3.7/site-packages/pymc3/sampling.py in sample_posterior_predictive(trace, samples, model, vars, size, random_seed, progressbar)
1144 indices.close()
1145
→ 1146 return {k: np.asarray(v) for k, v in ppc_trace.items()}
1147
1148

~/anaconda3/lib/python3.7/site-packages/pymc3/sampling.py in (.0)
1144 indices.close()
1145
→ 1146 return {k: np.asarray(v) for k, v in ppc_trace.items()}
1147
1148

~/anaconda3/lib/python3.7/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
499
500 “”"
→ 501 return array(a, dtype, copy=False, order=order)
502
503

MemoryError:

Model:

b_input = tt.shared(np.asarray(X_train))

with pm.Model() as model:

#priors
a = pm.Normal('intercept', mu = 0, sd = 5)
beta = pm.Normal('betas', mu = 0, sd = 5)
sd = pm.HalfCauchy('sd', 5)

mu = a + T.dot(beta, b_input)

y = pm.Normal('y', mu = mu, sd = sd, observed = Y_train)

Has anyone else seen this specifically with sampling the posterior?

lucianopaz · March 10, 2019, 9:40pm

What are the shapes of train_X and train_y?

jordan.howell2 · March 10, 2019, 11:45pm

It was (40384, 54). I ran it with 100 samples and it worked. Then I ran it with 500 samples again and it worked after failing a few times. A bug maybe?

jordan.howell2 · March 10, 2019, 11:46pm

Sorry. That was X-test. X-train was (127534, 54). Y-train was (127534,1)

lucianopaz · March 11, 2019, 5:20am

Ok, this means that the output of the posterior predictive should be a (500, 40384) array of float64. The current implementation does not preallocate the memory for that, makes a list and then converts it to an array, so it would take up roughly twice that size. That would explain the intermittency. How much RAM do you have?
Overall, I think the implementation could be slightly improved with some preallocation after the first draw.

jordan.howell2 · March 11, 2019, 7:45am

32 gigs

lucianopaz · March 11, 2019, 8:10am

The full array should just take up 25Gb, so it should fit. It doesn’t because of the preallocation issue, where you end up using almost twice the memory. It should be easy to add some preallocation logic after making the very first draw (which let’s you infer the outputs shape). Would you be interested in making a PR?

jordan.howell2 · March 11, 2019, 10:45am

I apologize but I’m not sure what a ‘PR’ is.

lucianopaz · March 11, 2019, 4:05pm

It is an acronym for pull request. It is a term used in git to make a contribution to the pymc3 code base. If you feel adventurous, you can try to read some tutorials online on how to use git and github. If not, I’ll try to make one when I get the chance

jordan.howell2 · March 12, 2019, 10:17am

Would you want me to push my model code up and error?

lucianopaz · March 12, 2019, 10:23am

There’s no need, thanks

Topic		Replies	Views
Pm.sample_posterior_predictive hanging on certain architecture v5	3	73	October 30, 2024
MemeoryError when sampling Questions	2	565	July 12, 2018
Memory issues with creating simple regression model Questions	4	1983	June 17, 2019
Understanding shape of values returned by sample_posterior_predictive Questions	3	955	January 26, 2020
Load_trace and sample_posterior_predictive do not work Questions	1	613	September 2, 2019

Memory Error with posterior_predictive_sample

100%|██████████| 500/500 [06:02<00:00, 1.10s/it]

Related topics