Slow sampling in pymc3 (on "tutorial problem")

Hi

I am a beginner and currently working through:
https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC3

I am trying to reproduce the “challenger disaster” analysis with code:

import pymc3 as pm
import theano.tensor as tt

temperature = data[:,0].astype(float)
D = data[:,1].astype(float)

with pm.Model() as model:

    beta = pm.Normal('beta', 0, 0.001, testval=0)
    alpha = pm.Normal('alpha', 0,0.001, testval =0)
    p = pm.Deterministic('p',1.0/(1.+tt.exp(beta*temperature + alpha)))

    observed = pm.Bernoulli('bernoulli_obs',p,observed=D)
    start = pm.find_MAP()
    step = pm.Metropolis()
    trace = pm.sample(120000, step=step, start=start)
    burned_trace = trace[100000::2]

Now, I get sampling rates around “100 draws/s”. In the link above the same sampling only takes some 16s, for me around 50 min.

Note that I did the same before on pymc, where I had similar performance to the “literature”.

I am running this in a Jupyter Notebook
Python version: ‘3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]’
PyMC3: 3.7

Does anyone have similar experience and found a way around?

The followings are generally not recommended any more (and we should probably work with Cam to update all the codes):

  • pm.find_MAP()
  • pm.Metropolis()

I suggest you to try just sample with the default: trace = pm.sample(). Also, if you are using the default sampling (i.e., NUTS), you dont need thinning and burnin.

I tried to remove the find_MAP start condition, and use the default step, method (e.g. no step argument), see code below.

import pymc3 as pm
import theano.tensor as tt

temperature = data[:,0].astype(float)
D = data[:,1].astype(float)

with pm.Model() as model:

    beta = pm.Normal('beta', 0, 0.001, testval=0)
    alpha = pm.Normal('alpha', 0,0.001, testval =0)
    p = pm.Deterministic('p',1.0/(1.+tt.exp(beta*temperature + alpha)))

    observed = pm.Bernoulli('bernoulli_obs',p,observed=D)
    trace = pm.sample(120000)

But still, sampling rate is only below 100draws/s. Do you have another idea?

EDIT:
I should mention mention that when loading pymc3 I get the warning shown at the bottom of this page, I mentions severe performance degradation…
Hence, I tried to install “m2w64-toolchain”, but then hell broke loose and I couldn’t get pymc running at all no more (sry, no errors logged, sth. about theano initialization was not right…)

After re-installing everything (including anaconda) I am now back at the same place that I was…

I am quite desperate, I would really like to learn more about this technique, but waiting 5 min for every beginners mistake I have to uncover is unbearable…

best regards

> 
> WARNING (theano.configdefaults): g++ not available, if using conda: `conda install m2w64-toolchain`
> C:\Users\Lenovo\Anaconda3\lib\site-packages\theano\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
>   warnings.warn("DeprecationWarning: there is no c++ compiler."
> WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
> WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
> C:\Users\Lenovo\Anaconda3\lib\site-packages\dask\config.py:168: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
>   data = yaml.load(f.read()) or {}
> C:\Users\Lenovo\Anaconda3\lib\site-packages\distributed\config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
>   defaults = yaml.load(f)