Hi,

I’ve just started using pymc3 and I’m trying to build a simple multivariate regression model but my jupyter kernel keeps dying or having a memory error. I’m sure it’s something with the way I’ve designed the model but I cannot work it out. My design matrix is large-ish, at (300000, 17) but not so large. Sklearn ridge regression runs very fast.

here is my code

```
basic_model = pm.Model()
with basic_model:
alpha = pm.Normal('alpha', mu=0, sd=1)
beta = pm.Normal('beta', mu=0, sd=1, shape=(len(data_X.columns), 1))
sigma = pm.HalfNormal('sigma', sd=10)
mu = alpha + pm.math.dot(data_X.values, beta)
# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal('Y_obs', mu=mu, sd=sigma, observed=data_y.values)
```

If I run the above model, I immediately get a memory exception inside theano chunk function (see stack trace at the end). If I limit the number of rows in my design matrix to ~10k, then it doesn’t crash but quickly uses over 100g of ram (if I then do pm.sample it can barely manage 1-2 it/sec).

I am running on a large AWS box. I’m aware of the previous issue with amazon boxes talked about here, however, I can run the random_walk_deep_net notebook no problem on my AWS box without any leaks. Also, if I run the model on my local windows box, I also get a related memory error in the theano code.

Any ideas super appreciated…

Thank you,

William

Environment:

- PyMC3 Version: 3.6
- Theano Version: 1.0.3
- Python Version: 3.6.8
- Operating system: AWS linux

Stack trace below:

MemoryError Traceback (most recent call last)

in ()

15

16 # Likelihood (sampling distribution) of observations

—> 17 Y_obs = pm.Normal(‘Y_obs’, mu=mu, sd=sigma, observed=data_y.values)

~/.conda/envs/my_root/lib/python3.6/site-packages/pymc3/distributions/distribution.py in **new**(cls, name, *args, **kwargs)

40 total_size = kwargs.pop(‘total_size’, None)

41 dist = cls.dist(*args, **kwargs)

—> 42 return model.Var(name, dist, data, total_size)

43 else:

44 raise TypeError(“Name needs to be a string but got: {}”.format(name))

~/.conda/envs/my_root/lib/python3.6/site-packages/pymc3/model.py in Var(self, name, dist, data, total_size)

837 var = ObservedRV(name=name, data=data,

838 distribution=dist,

–> 839 total_size=total_size, model=self)

840 self.observed_RVs.append(var)

841 if var.missing_values:

~/.conda/envs/my_root/lib/python3.6/site-packages/pymc3/model.py in **init**(self, type, owner, index, name, data, distribution, total_size, model)

1322

1323 self.missing_values = data.missing_values

-> 1324 self.logp_elemwiset = distribution.logp(data)

1325 # The logp might need scaling in minibatches.

1326 # This is done in `Factor`

.

~/.conda/envs/my_root/lib/python3.6/site-packages/pymc3/distributions/continuous.py in logp(self, value)

478 mu = self.mu

479

–> 480 return bound((-tau * (value - mu)**2 + tt.log(tau / np.pi / 2.)) / 2.,

481 sd > 0)

482

~/.conda/envs/my_root/lib/python3.6/site-packages/theano/tensor/var.py in **sub**(self, other)

145 # and the return value in that case

146 try:

–> 147 return theano.tensor.basic.sub(self, other)

148 except (NotImplementedError, AsTensorError):

149 return NotImplemented

~/.conda/envs/my_root/lib/python3.6/site-packages/theano/gof/op.py in **call**(self, *inputs, **kwargs)

672 thunk.outputs = [storage_map[v] for v in node.outputs]

673

–> 674 required = thunk()

675 assert not required # We provided all inputs

676

~/.conda/envs/my_root/lib/python3.6/site-packages/theano/gof/op.py in rval()

860

861 def rval():

–> 862 thunk()

863 for o in node.outputs:

864 compute_map[o][0] = True

~/.conda/envs/my_root/lib/python3.6/site-packages/theano/gof/cc.py in **call**(self)

1733 print(self.error_storage, file=sys.stderr)

1734 raise

-> 1735 reraise(exc_type, exc_value, exc_trace)

1736

1737

~/.conda/envs/my_root/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)

691 if value.**traceback** is not tb:

692 raise value.with_traceback(tb)

–> 693 raise value

694 finally:

695 value = None