Sample speed Mac vs Linux/Ubuntu

I’m seeing weirdly slow sampling speed on a fresh conda Python 3.5 w/PyMC 3.5 install on EC2. I’m running on a pretty beefy c5.4xlarge instance that’s not running anything else and get several orders of magnitude slower sampling speed than on my Macbook Pro.
The model is just a simple fully pooled test model:

On Mac:

On Notebook Server running Ubuntu:

Sampling parameters are:

SEED = 42 

    'chains': CHAINS,
    'cores': 3,
    'draws': 2000,
    'tune': 1000,
    'target_accept': 0.8,
    'random_seed': [SEED+s for s in np.arange(0, CHAINS)]

Is there something special to pay attention to when installing PyMC3/theano on Ubuntu?
I ran the standard conda install pymc3 and had no issues.
I do get this warning when I import PyMC3

/home/ubuntu/anaconda3/lib/python3.6/site-packages/theano/ UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
  warnings.warn("DeprecationWarning: there is no c++ compiler."
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/ FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters```

Btw, same for ADVI via fit()

Aaand if I had read the error more carefully, I would have known to do this first:

sudo apt-get install g++
1 Like