Trouble putting together a regression model in pymc3

Atrus619 · June 7, 2019, 6:42pm

Hello, I am attempting to build a relatively simple model in pymc3, but the RAM usage is exploding and the model won’t even compile. I think I am clearly doing something wrong, but am unable to understand what exactly. Can someone help point me in the right direction? Thanks!

with Model() as model:
    # Priors
    psi_N = Uniform('psi_N', lower=0, upper=1)
    Binomial_N = Binomial('Binomial_N', n=1, p=psi_N)
    mu_N_noise = StudentT('mu_N_noise', nu=3, mu=0, sd=1)
    mu_N_A = StudentT('mu_N_A', nu=3, mu=0, sd=1, shape=num_A)
    mu_N_RS = StudentT('mu_N_RS', nu=3, mu=0, sd=1)
    mu_N_MM = StudentT('mu_N_MM', nu=3, mu=0, sd=1)
    mu_N_TC = StudentT('mu_N_IP', nu=3, mu=0, sd=1)

    # Primary variables, Frequency (N) and Severity (X)
    N = Binomial_N * tt.exp(mu_N_noise + mu_N_A[AgeBand] + mu_N_RS * RS17 + mu_N_MM * MM + mu_N_TC * Total_Count)

    # Model error
    sigma_y = Normal('sigma_y', mu=0, sd=1e3)

    # Data likelihood
    y_like = Normal('y_like', mu=N, sd=sigma_y, observed=y_train.values)

lucianopaz · June 8, 2019, 1:08pm

What’s y_train's shape? One quick option to reduce memory consumption is to sample(cores=1). When using more than one, new processes are forked, data is copied across process and the overall memory consumption scales with the number of cores.

Gon_F · June 8, 2019, 5:26pm

Yeah, this seems like a standard model to make in Pymc3, so probably the data input is just too large. What errors specifically are you getting?

Atrus619 · June 10, 2019, 4:45pm

The data is about 50,000 records that I am feeding to this model. Upon running the snippet posted above, the RAM on my computer fills up (8 cores, 64gb RAM), then the RAM usage by Python drops but the RAM seems to still be occupied somewhere, until eventually it errors out (and most of the RAM seems to still be in use, possibly cached or leaked perhaps?). The error, based on the trace:

Traceback (most recent call last):
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py”, line 3296, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File “”, line 18, in
y_like = Normal(‘y_like’, mu=N, sd=sigma_y, observed=y_train.values)
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\distribution.py”, line 42, in new
return model.Var(name, dist, data, total_size)
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\model.py”, line 839, in Var
total_size=total_size, model=self)
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\model.py”, line 1327, in init
self.logp_sum_unscaledt = distribution.logp_sum(data)
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\distribution.py”, line 119, in logp_sum
return tt.sum(self.logp(*args, **kwargs))
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\distributions\continuous.py”, line 480, in logp
return bound((-tau * (value - mu)**2 + tt.log(tau / np.pi / 2.)) / 2.,
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\theano\tensor\var.py”, line 147, in sub
return theano.tensor.basic.sub(self, other)
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\theano\gof\op.py”, line 674, in call
required = thunk()
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\theano\gof\op.py”, line 862, in rval
thunk()
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\theano\gof\cc.py”, line 1735, in call
reraise(exc_type, exc_value, exc_trace)
File “C:\Users\CN149325\AppData\Local\Continuum\anaconda3\lib\site-packages\six.py”, line 693, in reraise
raise value
MemoryError: None

lucianopaz · June 10, 2019, 8:54pm

There are some places that can be eating up a lot of memory during the theano graph construction. You can try changing some flags to disable optimizations that could lower the memory footprint but increase runtime.

I also recommend you take a look at this study of memory allocation both in Python in general but applied in particular to theano.

Topic		Replies	Views
Memory issues with creating simple regression model Questions	4	1913	June 17, 2019
Excessive memory - Multiple regression Questions	9	2899	October 24, 2017
Memory Error with posterior_predictive_sample Questions	10	1546	March 12, 2019
Fitting a complex model with a large dataset: chain contains only diverging samples Questions	5	1231	December 12, 2019
Confused about building a compound distribution hierarchical Bayes model with continuous and categorical inputs in pymc3 Questions	1	769	June 8, 2019

Trouble putting together a regression model in pymc3

Related Topics