Sampling running very slowly for all models?

ProcessEngineer · April 27, 2020, 2:14pm

Hi - new to pymc3, and currently all of the tutorial examples that I run are going very slowly. As an example, I took this code, reduced the number of samples from 10,000 to 1,000, but it is still taking around 10 minutes to run (~10 draws/s.) There’s another script that my coworker can run in 15 minutes with a similar machine, but when I try to run it it was estimated to take 25 hours.

Here are the software versions I’m running:
pymc3 3.8
theano 1.0.4
Python 3.7.4

My processor is an Intel Core 15-8350U 1.7GHz 8 cores (no discrete GPU)

Here is the code I was running (takes ~10 minutes to complete.) (Note, I also wrapped it in an “if __name__ == ‘__main__’:” statement to avoid getting “RuntimeError: The communication pipe between the main process and its spawned children is broken.”)

Any help would be appreciated! There are a few different projects I’d like to use pymc3 for, but right now it’s hard to even do tutorials because everything takes to long to sample.

import numpy as np, pandas as pd, matplotlib.pyplot as plt, seaborn as sns

if __name__ == '__main__':

sns.set_context('paper')
sns.set_style('darkgrid')

import pymc3 as pm, theano.tensor as tt

# simulate data from a known mixture distribution
np.random.seed(12345) # set random seed for reproducibility

k = 3
ndata = 500
spread = 5
centers = np.array([-spread, 0, spread])

# simulate data from mixture distribution
v = np.random.randint(0, k, ndata)
data = centers[v] + np.random.randn(ndata)

plt.hist(data);

# setup model
model = pm.Model()
with model:
    # cluster sizes
    p = pm.Dirichlet('p', a=np.array([1., 1., 1.]), shape=k)
    # ensure all clusters have some points
    p_min_potential = pm.Potential('p_min_potential', tt.switch(tt.min(p) < .1, -np.inf, 0))


    # cluster centers
    means = pm.Normal('means', mu=[0, 0, 0], sd=15, shape=k)
    # break symmetry
    order_means_potential = pm.Potential('order_means_potential',
                                         tt.switch(means[1]-means[0] < 0, -np.inf, 0)
                                         + tt.switch(means[2]-means[1] < 0, -np.inf, 0))

    # measurement error
    sd = pm.Uniform('sd', lower=0, upper=20)

    # latent cluster of each observation
    category = pm.Categorical('category',
                              p=p,
                              shape=ndata)

    # likelihood for each observed value
    points = pm.Normal('obs',
                       mu=means[category],
                       sd=sd,
                       observed=data)


# fit model
with model:
    step1 = pm.Metropolis(vars=[p, sd, means])
    step2 = pm.ElemwiseCategorical(vars=[category], values=[0, 1, 2])
    tr = pm.sample(1000, step=[step1, step2])

ProcessEngineer · April 27, 2020, 6:56pm

Discovered the issue - I needed to install the m2w64-toolchain. After running “conda install m2w64-toolchain” everything is running much much faster.

Topic		Replies	Views
Pymc3 sampling slow theano , modeling	0	499	June 10, 2022
Very slow sampling. Still can't find what is happening after checking FAQs Questions theano	9	2763	January 19, 2023
Slow sampling in pymc3 (on "tutorial problem") Questions	8	10227	July 17, 2019
Sampling does not start or very very slow while attempting a Mixture Model tutorial Questions	4	2221	September 19, 2018
I can't get pymc3 to work Questions bug	4	2585	January 15, 2021

Sampling running very slowly for all models?

Related topics