Hi - new to pymc3, and currently all of the tutorial examples that I run are going very slowly. As an example, I took this code, reduced the number of samples from 10,000 to 1,000, but it is still taking around 10 minutes to run (~10 draws/s.) There’s another script that my coworker can run in 15 minutes with a similar machine, but when I try to run it it was estimated to take 25 hours.
Here are the software versions I’m running:
pymc3 3.8
theano 1.0.4
Python 3.7.4
My processor is an Intel Core 15-8350U 1.7GHz 8 cores (no discrete GPU)
Here is the code I was running (takes ~10 minutes to complete.) (Note, I also wrapped it in an “if __name__
== ‘__main__
’:” statement to avoid getting “RuntimeError: The communication pipe between the main process and its spawned children is broken.”)
Any help would be appreciated! There are a few different projects I’d like to use pymc3 for, but right now it’s hard to even do tutorials because everything takes to long to sample.
import numpy as np, pandas as pd, matplotlib.pyplot as plt, seaborn as sns
if __name__ == '__main__':
sns.set_context('paper')
sns.set_style('darkgrid')
import pymc3 as pm, theano.tensor as tt
# simulate data from a known mixture distribution
np.random.seed(12345) # set random seed for reproducibility
k = 3
ndata = 500
spread = 5
centers = np.array([-spread, 0, spread])
# simulate data from mixture distribution
v = np.random.randint(0, k, ndata)
data = centers[v] + np.random.randn(ndata)
plt.hist(data);
# setup model
model = pm.Model()
with model:
# cluster sizes
p = pm.Dirichlet('p', a=np.array([1., 1., 1.]), shape=k)
# ensure all clusters have some points
p_min_potential = pm.Potential('p_min_potential', tt.switch(tt.min(p) < .1, -np.inf, 0))
# cluster centers
means = pm.Normal('means', mu=[0, 0, 0], sd=15, shape=k)
# break symmetry
order_means_potential = pm.Potential('order_means_potential',
tt.switch(means[1]-means[0] < 0, -np.inf, 0)
+ tt.switch(means[2]-means[1] < 0, -np.inf, 0))
# measurement error
sd = pm.Uniform('sd', lower=0, upper=20)
# latent cluster of each observation
category = pm.Categorical('category',
p=p,
shape=ndata)
# likelihood for each observed value
points = pm.Normal('obs',
mu=means[category],
sd=sd,
observed=data)
# fit model
with model:
step1 = pm.Metropolis(vars=[p, sd, means])
step2 = pm.ElemwiseCategorical(vars=[category], values=[0, 1, 2])
tr = pm.sample(1000, step=[step1, step2])