Sampling became much slower after upgrading pymc3

Hi,

in the past weeks I have been using pymc3 to fit some simple linear models on version 3.1 of pymc3 (theano 0.8).
Recently, I upgraded pymc3 to version 3.4.1 using pip. Along with it Theano was automatically upgraded from version 0.8 to version 1.0.1.
In both cases I use the GPU.

I noticed a great slow down of sampling speed from 332.31it/s to 6.08it/s.
Do you have any suggestion on how to fix this problem?
Thank you in advance

Before the upgrade:

Auto-assigning NUTS sampler…
Initializing NUTS using advi…
Average ELBO = 1,733.3: 79%|███████▉ | 39698/50000 [00:05<00:01, 7567.23it/s]Median ELBO converged.
Finished [100%]: Average ELBO = 1,887.9
100%|██████████| 5000/5000 [00:15<00:00, 332.31it/s]

After the upgrade:

Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
Sequential sampling (1 chains in 1 job)
NUTS: [a, sigma_a_log__, sd_y_log__, b, sigma_b_log__, mu_b, theta, gamma]
66%|██████▋ | 3987/6000 [10:55<05:30, 6.08it/s]

The speed goes back to 291.75it/s if I use the CPU instead of GPU.

That’s interesting. We keep some benchmarks at http://pandas.pydata.org/speed/pymc3/ (you can see one performance regression that was recently fixed by @junpenglao ), but they are all run on the CPU.

A few thoughts:

  • It looks like you’re doing just 1 chain. Is that also on the GPU? Could this commit have something to do with it? (https://github.com/pymc-devs/pymc3/pull/2613)
  • Initialization is better now - are the posteriors between the two very different? The new version might be sampling correctly, but slowly (i.e., more NUTS steps)
  • What happens if you use ADVI for initialization in v3.4?

I had a similar problem when I updated to catalina. It turns out that I just needed to install xcode command line tools and it completely fixed the problem.

xcode-select --install