Bayesian VAR/Multivariate slow performance

I’m currently using PyMC 5.6.0 and have copied the BVAR example line for line.

The progress bar in the example shows the sampling being complete in 5:59 using CPU only.

On my laptop with a Core i9-12900H (14 cores, 20 logical), the same example takes over 2 hours. (My OS is Windows, but am running in a Linux Python env via WSL)

Switching to the laptop GPU (RTX A2000) by changing the sampling method to:

idata.extend(pm.sample(nuts_sampler="numpyro", draws=2000, random_seed=130, nuts_sampler_kwargs={"chain_method": "vectorized"}))

I can get the time down to just under 30 mins.

Is this typical performance for this task, and if so what could be done to speed it up?

Note sure if this is related? Bayesian VAR example notebook: extremely low sampling rate - #16 by emanuele

Edit: Just ran the same code on a bare-metal Linux machine (16-core Threadripper) and am seeing similarly poor performance), so that rules out WSL as a cause

Edit 2: Same laptop, same code but running in Windows, CPU only ~ 35 mins (compared to 2 hours in Linux. Could possibly be improved since CPU util is ~35%)

Ok this did the trick:

import os

os.environ["MKL_NUM_THREADS"] = "1"
os.environ["OPENBLAS_NUM_THREADS"] = "1"

The example now runs in < 3 mins :slight_smile:


Awesome, we really need to add an info at the top of that pymc-example

If someone has interest this would be a good beginner issue: Mention likely multi-threading issues on BVAR notebook · Issue #550 · pymc-devs/pymc-examples · GitHub

So that users don’t keep falling on the same problem.