Slow sampling in a Hierarchical Logistic Regression example

You don’t need to use the shared variables as the model_input and your model_output will not change. You can further vectorize the samling of mu_intercept and mu_slope by using shape=2. Similarly, you can also vectorize the sigma_slope and intercept. I don’t know what the cat is but if it is an index then you can go ahead and remove the cat_theano variable also and use the cat as index directly. This will speed up your computation a little but the most computation is happening at the step pm.math.sigmoid(...). You can use multiple cores to get a high speed up.

>>> import numpy as np
>>> import theano
>>> import theano.tensor as tt
>>> import pymc3 as pm
>>> y_data = np.zeros(60000)
>>> y_data[:20000] = np.ones(20000)
>>> model_input = np.random.randn(60000, 4)
>>> model_output = y_data
>>> with pm.Model() as model:
...  mu = pm.Normal("mu", mu=0, sd=10, shape=2)
...  sigma = pm.Uniform("sigma", 0., 10., testval=2., shape=2)
...  intercept = pm.Normal("intercept", mu=mu[0], tau=sigma[0], shape=60000)
...  slope = pm.Normal("slope", mu=mu[1], tau=sigma[1], shape=(60000, 4))
...  p = pm.math.sigmoid(intercept + pm.math.sum(slope * model_input, 1))
...  like = pm.Bernoulli("like", p, observed=model_output)
...
>>> with model:
...  trace = pm.sample(500, init='advi', n_init=5000, chains=2, tune=500, target_accept=0.95)
...

(I am using core i7 cpu on a linux machine)