Slow sampling in pymc3 (on "tutorial problem")

Hi

I am a beginner and currently working through:
https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC3

I am trying to reproduce the “challenger disaster” analysis with code:

import pymc3 as pm
import theano.tensor as tt

temperature = data[:,0].astype(float)
D = data[:,1].astype(float)

with pm.Model() as model:

    beta = pm.Normal('beta', 0, 0.001, testval=0)
    alpha = pm.Normal('alpha', 0,0.001, testval =0)
    p = pm.Deterministic('p',1.0/(1.+tt.exp(beta*temperature + alpha)))

    observed = pm.Bernoulli('bernoulli_obs',p,observed=D)
    start = pm.find_MAP()
    step = pm.Metropolis()
    trace = pm.sample(120000, step=step, start=start)
    burned_trace = trace[100000::2]

Now, I get sampling rates around “100 draws/s”. In the link above the same sampling only takes some 16s, for me around 50 min.

Note that I did the same before on pymc, where I had similar performance to the “literature”.

I am running this in a Jupyter Notebook
Python version: ‘3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]’
PyMC3: 3.7

Does anyone have similar experience and found a way around?

The followings are generally not recommended any more (and we should probably work with Cam to update all the codes):

  • pm.find_MAP()
  • pm.Metropolis()

I suggest you to try just sample with the default: trace = pm.sample(). Also, if you are using the default sampling (i.e., NUTS), you dont need thinning and burnin.

I tried to remove the find_MAP start condition, and use the default step, method (e.g. no step argument), see code below.

import pymc3 as pm
import theano.tensor as tt

temperature = data[:,0].astype(float)
D = data[:,1].astype(float)

with pm.Model() as model:

    beta = pm.Normal('beta', 0, 0.001, testval=0)
    alpha = pm.Normal('alpha', 0,0.001, testval =0)
    p = pm.Deterministic('p',1.0/(1.+tt.exp(beta*temperature + alpha)))

    observed = pm.Bernoulli('bernoulli_obs',p,observed=D)
    trace = pm.sample(120000)

But still, sampling rate is only below 100draws/s. Do you have another idea?

EDIT:
I should mention mention that when loading pymc3 I get the warning shown at the bottom of this page, I mentions severe performance degradation…
Hence, I tried to install “m2w64-toolchain”, but then hell broke loose and I couldn’t get pymc running at all no more (sry, no errors logged, sth. about theano initialization was not right…)

After re-installing everything (including anaconda) I am now back at the same place that I was…

I am quite desperate, I would really like to learn more about this technique, but waiting 5 min for every beginners mistake I have to uncover is unbearable…

best regards

> 
> WARNING (theano.configdefaults): g++ not available, if using conda: `conda install m2w64-toolchain`
> C:\Users\Lenovo\Anaconda3\lib\site-packages\theano\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
>   warnings.warn("DeprecationWarning: there is no c++ compiler."
> WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
> WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
> C:\Users\Lenovo\Anaconda3\lib\site-packages\dask\config.py:168: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
>   data = yaml.load(f.read()) or {}
> C:\Users\Lenovo\Anaconda3\lib\site-packages\distributed\config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
>   defaults = yaml.load(f)

You need to install gcc for Windows. If you were in Linux or macOS, you could install it with a command, but since you are using Windows, you have to download it. There is a tutorial here, and it is in Spanish (of course you can look for a tutorial in English, but pay attention to step number seven).

Thank you for the suggestions. I did, install and added it to the path…
I got a bit “further” in the sense that the c++ compiler seems to be recognized to some extend, the error is described above does not appear anymore.

But, not I run into another error (I am running on a Win10, 64-bit) system:

> Problem occurred during compilation with the command line below:
> "C:\MinGW\bin\g++.exe" -shared -g -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -DMS_WIN64 -I"C:\Users\Lenovo\Anaconda3\lib\site-packages\numpy\core\include" -I"C:\Users\Lenovo\Anaconda3\include" -I"C:\Users\Lenovo\Anaconda3\lib\site-packages\theano\gof\c_code" -L"C:\Users\Lenovo\Anaconda3\libs" -L"C:\Users\Lenovo\Anaconda3" -o "C:\Users\Lenovo\AppData\Local\Theano\compiledir_Windows-10-10.0.17763-SP0-Intel64_Family_6_Model_69_Stepping_1_GenuineIntel-3.7.3-64\lazylinker_ext\lazylinker_ext.pyd" "C:\Users\Lenovo\AppData\Local\Theano\compiledir_Windows-10-10.0.17763-SP0-Intel64_Family_6_Model_69_Stepping_1_GenuineIntel-3.7.3-64\lazylinker_ext\mod.cpp" -lpython37cc1plus.exe: sorry, unimplemented: 64-bit mode not compiled in

I found the thread below, apparently discussing the same issue. Unforntunately none of the suggestions worked for the OP so at this point I am hesitant to invest too much time with little chance of success…
By chance, any of you has an update on how to treat this error?

I found this.

Thank you. I installed the 64-bit version. Re-installed pymc3 & theano in some combinantions of installing orders etc…

Unfortunately still problems.

  • behavior when importing pymc3 still erratic, overall takes very long (>10s) and I get an error:

C:\Users\Lenovo\Anaconda3\lib\site-packages\dask\config.py:168: YAMLLoadWarning: calling yaml.load() without Loader=… is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
data = yaml.load(f.read()) or {}
C:\Users\Lenovo\Anaconda3\lib\site-packages\distributed\config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=… is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
defaults = yaml.load(f)

  • importing theano seems to work
  • when I then declare a model basic_model = pm.Model() this seems to work
  • when I start declaring variables (e.g. see below) and run, it takes again >10s to execute the cell in jupyter
alpha = pm.Normal('alpha', mu=0, sigma=10)
beta = pm.Normal('beta', mu=0, sigma=10, shape=2)
sigma = pm.HalfNormal('sigma', sigma=1)
  • when I try to sample (see below) I get the Auto-assigning NUTS/Initilizing NUTs using jitter+adapt_diag... but then it freezes until kernel connection fails.

I have a feeling that some configuration of my machine is severely off (also get other complaints about a Windows service package in a Conda-Shell…)

I think I will try with pymc3 again when I re-setup my machine at some point. Until then, pymc(v2.x.x) seems to work.

Thank you anyways for your help.

Env setup could be tricky in WinOS - are you using Conda and installing it from a fresh virtual env? In most cases it should do the trick.

Yes, thats what I did. Conda and all in a new env…
Currently not in the mood for a re-setup, I will have a Linux partition next time…

regards

1 Like