Bayesian VAR example notebook: extremely low sampling rate

To be completely sure, I just re-built from scratch the installation using the recommended installation instruction for Linux, to which I added:

conda install -c conda-forge arviz statsmodels python-graphviz ipython

I ran the example cell-by-cell within IPython till I reached the exact same issue:

Here is the BLAS check output:

$ python -m pytensor.misc.check_blas

[...cut...]

Some PyTensor flags:
    blas__ldflags= -L/home/ubuntu/miniconda3/envs/pymc_env/lib -lcblas -lblas -lcblas -lblas
    compiledir= /home/ubuntu/.pytensor/compiledir_Linux-5.15--aws-x86_64-with-glibc2.31-x86_64-3.11.3-64
    floatX= float64
    device= cpu
Some OS information:
    sys.platform= linux
    sys.version= 3.11.3 | packaged by conda-forge | (main, Apr  6 2023, 08:57:19) [GCC 11.3.0]
    sys.prefix= /home/ubuntu/miniconda3/envs/pymc_env
Some environment variables:
    MKL_NUM_THREADS= None
    OMP_NUM_THREADS= None
    GOTO_NUM_THREADS= None

Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
blas_info:
    libraries = ['cblas', 'blas', 'cblas', 'blas']
    library_dirs = ['/home/ubuntu/miniconda3/envs/pymc_env/lib']
    include_dirs = ['/home/ubuntu/miniconda3/envs/pymc_env/include']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
    libraries = ['cblas', 'blas', 'cblas', 'blas']
    library_dirs = ['/home/ubuntu/miniconda3/envs/pymc_env/lib']
    include_dirs = ['/home/ubuntu/miniconda3/envs/pymc_env/include']
    language = c
lapack_info:
    libraries = ['lapack', 'blas', 'lapack', 'blas']
    library_dirs = ['/home/ubuntu/miniconda3/envs/pymc_env/lib']
    language = f77
lapack_opt_info:
    libraries = ['lapack', 'blas', 'lapack', 'blas', 'cblas', 'blas', 'cblas', 'blas']
    library_dirs = ['/home/ubuntu/miniconda3/envs/pymc_env/lib']
    language = c
    define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ubuntu/miniconda3/envs/pymc_env/include']
Supported SIMD extensions in this NumPy install:
    baseline = SSE,SSE2,SSE3
    found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2,AVX512F,AVX512CD,AVX512_SKX,AVX512_CLX
    not found = AVX512_CNL,AVX512_ICL
Numpy dot module: numpy
Numpy location: /home/ubuntu/miniconda3/envs/pymc_env/lib/python3.11/site-packages/numpy/__init__.py
Numpy version: 1.24.3

We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).

Total execution time: 1.59s on CPU (with direct PyTensor binding to blas).

Try to run this script a few times. Experience shows that the first time is not as fast as following calls. The difference is not big, but consistent.

Note: the machine I’m using has many CPUs, but only two are used by the code in execution.

Note2: if I use another PyMC code/model on the same machine/environment, sampling is much faster and uses multiple cores. So it seems a problem specific to the example.