To be completely sure, I just re-built from scratch the installation using the recommended installation instruction for Linux, to which I added:
conda install -c conda-forge arviz statsmodels python-graphviz ipython
I ran the example cell-by-cell within IPython till I reached the exact same issue:
Here is the BLAS check output:
$ python -m pytensor.misc.check_blas
[...cut...]
Some PyTensor flags:
blas__ldflags= -L/home/ubuntu/miniconda3/envs/pymc_env/lib -lcblas -lblas -lcblas -lblas
compiledir= /home/ubuntu/.pytensor/compiledir_Linux-5.15--aws-x86_64-with-glibc2.31-x86_64-3.11.3-64
floatX= float64
device= cpu
Some OS information:
sys.platform= linux
sys.version= 3.11.3 | packaged by conda-forge | (main, Apr 6 2023, 08:57:19) [GCC 11.3.0]
sys.prefix= /home/ubuntu/miniconda3/envs/pymc_env
Some environment variables:
MKL_NUM_THREADS= None
OMP_NUM_THREADS= None
GOTO_NUM_THREADS= None
Numpy config: (used when the PyTensor flag "blas__ldflags" is empty)
blas_info:
libraries = ['cblas', 'blas', 'cblas', 'blas']
library_dirs = ['/home/ubuntu/miniconda3/envs/pymc_env/lib']
include_dirs = ['/home/ubuntu/miniconda3/envs/pymc_env/include']
language = c
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
libraries = ['cblas', 'blas', 'cblas', 'blas']
library_dirs = ['/home/ubuntu/miniconda3/envs/pymc_env/lib']
include_dirs = ['/home/ubuntu/miniconda3/envs/pymc_env/include']
language = c
lapack_info:
libraries = ['lapack', 'blas', 'lapack', 'blas']
library_dirs = ['/home/ubuntu/miniconda3/envs/pymc_env/lib']
language = f77
lapack_opt_info:
libraries = ['lapack', 'blas', 'lapack', 'blas', 'cblas', 'blas', 'cblas', 'blas']
library_dirs = ['/home/ubuntu/miniconda3/envs/pymc_env/lib']
language = c
define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
include_dirs = ['/home/ubuntu/miniconda3/envs/pymc_env/include']
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2,AVX512F,AVX512CD,AVX512_SKX,AVX512_CLX
not found = AVX512_CNL,AVX512_ICL
Numpy dot module: numpy
Numpy location: /home/ubuntu/miniconda3/envs/pymc_env/lib/python3.11/site-packages/numpy/__init__.py
Numpy version: 1.24.3
We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).
Total execution time: 1.59s on CPU (with direct PyTensor binding to blas).
Try to run this script a few times. Experience shows that the first time is not as fast as following calls. The difference is not big, but consistent.
Note: the machine Iām using has many CPUs, but only two are used by the code in execution.
Note2: if I use another PyMC code/model on the same machine/environment, sampling is much faster and uses multiple cores. So it seems a problem specific to the example.