Hi all

I’ve recently begun to dabble with pymc3 for some modelling problems and have implemented a basic linear regression model. Having experimented on my laptop and trying to scale up the datasets (N=20,000, D=2000), i’m encountering an OOM using NUTS sampling over the weights. I’m not so familiar with the implementation nor sampling methods in general and was wondering if there was some kind of batch equivalent (like with reparameterized VI) for the sampling methods (i.e. metropolis, NUTS). The error is as follows, and makes sense why it’s happening (my GPU has only 2GB…)

It seems like it’s trying to accomodate the entire dataset which is clearly infeasible. However, the ADVI initialisation stage appears to complete without any memory errors, but fails immediately after:

Finished [100%]: Average Loss = 23,867

ERROR (theano.gof.opt): Optimization failure due to: constant_folding

ERROR (theano.gof.opt): node: GpuFromHost(TensorConstant{[[ 0.02855…01668083]]})

ERROR (theano.gof.opt): TRACEBACK:

The final traceback looks like:

pygpu.gpuarray.GpuArrayException: b’cuMemAlloc: CUDA_ERROR_OUT_OF_MEMORY: out of memory’

Apply node that caused the error: GpuFromHost(TensorConstant{[[ 0.02855…01668083]]})

Toposort index: 2

Inputs types: [TensorType(float64, matrix)]

Inputs shapes: [(20000, 2000)]

Inputs strides: [(8, 160000)]

Inputs values: [‘not shown’]

Outputs clients: [[GpuGemv{inplace=True}(GpuAllocEmpty{dtype=‘float64’, context_name=None}.0, TensorConstant{1.0}, GpuFromHost.0, GpuSubtensor{int64:int64:}.0, TensorConstant{0.0}), GpuGemv{inplace=True}(GpuAllocEmpty{dtype=‘float64’, context_name=None}.0, TensorConstant{1.0},

GpuFromHost.0, GpuSubtensor{int64:int64:}.0, TensorConstant{0.0})]]

Could anyone suggest ways to mitigate this without having to buy bigger hardware ? Is batch training applicable to NUTS or metropolis? The model is simple bayesian linear regression with learned weight and noise variances:

```
model = pm.Model()
with model:
# Hyper-priors
mu_w = pm.Normal('mu_w', mu=0., sd=100 ** 2)
sigma_w = pm.HalfCauchy('sigma_w', 100)
sigma_y = pm.HalfCauchy('sigma_y', beta=sigmaEps, testval=sigmaInit)
# Priors
w = pm.Normal('w', mu=mu_w, sd=sigma_w, shape=D)
# Likelihood model
y = pm.math.dot(w, X_train.T)
y_obs = pm.Normal('y_obs', mu=y, sd=sigma_y, observed=y_train.
trace = pm.sample(draws=1000, init="auto", n_init=10000, njobs=n_jobs)
```

As an aside, running fullrank_advi appears incredibly slow compared to NUTS but would be wonderful if i could use it as an alternative with batch training (I’m primarily interested in acquiring a usable predictive posterior so advi isn’t really an option). Perhaps grouped VI (http://docs.pymc.io/api/inference.html#pymc3.variational.opvi.Group) or normalizing flows (http://docs.pymc.io/notebooks/normalizing_flows_overview.html). That said, I’ve probably only scratched the surface!

Hardware is a laptop i7 6700, 16GB ram, GTX 960M with 2GB ram

thanks all!