Maybe related to this thread.
Sampling when using NUTS and tensor.dot operators slows down considerably as more chains are sampled. Here is an example:
Data generation for simple linear models
import theano as theano import theano.tensor as T import numpy as np import pymc3 as pm3 # test data N = 1000 x1 = 10 + 2 * np.random.randn(N,1) x2 = 5 + 2 * np.random.randn(N,1) X = np.c_[np.ones_like(x1),x1,x2] Y = X.dot([10,1,-1]) + 2*np.random.randn(N) print(np.linalg.inv(X.T.dot(X)).dot(X.T.dot(Y)))
and creating two identical pymc3 models:
with pm3.Model() as model1: beta = pm3.Flat('beta',shape=X.shape) sigma = pm3.Exponential('sigma',lam=.1) xb = T.dot(X, beta) like = pm3.Normal('like', mu = xb, sd = sigma, observed=Y)
with pm3.Model() as model2: beta = pm3.Flat('beta',shape=X.shape) sigma = pm3.Exponential('sigma',lam=.1) xb = beta + beta*X[:,1] + beta*X[:,2] like = pm3.Normal('like', mu = xb, sd = sigma, observed=Y)
and then sampling with chains=1, 2, and 3 for 100 samples and
tune=50. The times are given below:
|Chains||Model 1||Model 2|
|3||1min 41s||994 ms|
I should add that
- there are lots of free cpu’s and memory.
- the same timings occur if X is a tensor shared variable
- this doesn’t happen with Metropolis, but didn’t try any other samplers
- this doesn’t improve with more samples/tuning