BLAS issues on both Ubuntu and Windows (PyMC>=5.0.0)

This might become a very long thread but I really need some help with this. The beginning of this discussion can be found in How to properly use split data with pymc. Thanks a lot @jessegrabowski for your help there!

I will divide the information I have gathered so far into two cases, one on my work laptop (Windows) and another one on my personal laptop (Ubuntu). It is also important to mention that, although I hate all the warnings I’m getting on my Ubuntu, performance-wise everything is working fine — however, my main interest in PyMC at the moment is for a project for work, therefore it is very important that everything runs smoothly on my Windows machine.

To the issue. All cases were tested in the same line of code:

with pm.Model() as model:
    X = pm.MutableData('X', X_train)
    y = pm.MutableData('y', y_train)
    
    beta = pm.Normal('beta', mu=0, sigma=5, shape=len(X_train.columns))
    constant = pm.Normal('constant', mu=0, sigma=5)

    p = pm.Deterministic('p', pm.math.sigmoid(constant + X @ beta))
    
    observed = pm.Bernoulli("obs", p, observed=y)
    idata = pm.sample(3000)
    
    with model:
        pm.set_data({'X':X_test})
        pm.set_data({'y':np.zeros_like(y_test)})
        y_pred = pm.sample_posterior_predictive(idata)

General Info

It all started when my model (on my Ubuntu machine) was taking 15~18min to train when using the vector notation X @ beta while it was taking 100~120 seconds to train if writing everything explicitly like X[:,0]*beta[0] + X[:,1]*beta[1] * ... * X[:,n]*beta[n].

Then, @jessegrabowski suggested me to see the check_blas.py program within the pytensor library and check for two things:

  1. How long was it taking for the test and to compare that with the benchmark provided by check_blas.py.
  2. Check if in the line blas__ldflags had the -lmkl_rt flag.

Now for the individual cases:

Ubuntu

My benchmark for check_blas.py seemed fine. Around 10 seconds to execute 10 calls to geem with matrices of shapes (5000,5000). However, I did not have the -lmkl_rt flag.
I then installed MKL. Still, I didn’t have the -lmkl_rt flag but now both the vector form and the explicit form were taking around 100~120 seconds, which is great.

However, after doing all of that I started getting a bunch of warnings, such as:

  1. when importing pymc
/home/guin0x/.local/lib/python3.10/site-packages/pkg_resources/__init__.py:123:
 PkgResourcesDeprecationWarning: 1.12.1-git20200711.33e2d80-dfsg1-0.6 is an invalid version 
and will not be supported in a future release
  warnings.warn(
/home/guin0x/.local/lib/python3.10/site-packages/pkg_resources/__init__.py:123:
 PkgResourcesDeprecationWarning: 1.1build1 is an invalid version 
and will not be supported in a future release
  warnings.warn(
/home/guin0x/.local/lib/python3.10/site-packages/pkg_resources/__init__.py:123:
 PkgResourcesDeprecationWarning: 0.1.43ubuntu1 is an invalid version 
and will not be supported in a future release
  warnings.warn(

2) when running the model (FIXED by uninstalling Aesara; thanks @ricardoV94 )

/home/guin0x/.local/lib/python3.10/site-packages/multipledispatch/dispatcher.py:27: AmbiguityWarning: 
Ambiguities exist in dispatched function _unify

The following signatures may result in ambiguous behavior:
	[ConstrainedVar, object, Mapping], [object, ConstrainedVar, Mapping]
	[object, ConstrainedVar, Mapping], [ConstrainedVar, object, Mapping]
	[object, ConstrainedVar, Mapping], [ConstrainedVar, Var, Mapping]
	[object, ConstrainedVar, Mapping], [ConstrainedVar, Var, Mapping]


Consider making the following additions:

@dispatch(ConstrainedVar, ConstrainedVar, Mapping)
def _unify(...)

@dispatch(ConstrainedVar, ConstrainedVar, Mapping)
def _unify(...)

@dispatch(ConstrainedVar, ConstrainedVar, Mapping)
def _unify(...)

@dispatch(ConstrainedVar, ConstrainedVar, Mapping)
def _unify(...)
  warn(warning_text(dispatcher.name, ambiguities), AmbiguityWarning)

Windows

After all that happened on my Ubuntu machine, I decided to test what would happen in my Windows machine. I first did the BLAS checks:

My benchmark for check_blas.py seemed fine; around 12 seconds to execute 10 calls to geem with matrices of shapes (5000,5000). And I did have the -lmkl_rt flag there.

Since the benchmark seemed good enough, and I had the -lmkl_rt flag, I thought the model would train in a decent time. It wasn’t the case…

For the vector notation it took 689 seconds; while the explicit notation took 443 seconds. I would imagine that, if everything would have gone smoothly, this shouldn’t take so much more than 150 seconds (given that both laptops are very similar).

I think it is also worth mentioning that on my Windows machine I get a WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions when importing arviz; so it makes sense that is taking more time.

It is also worth mentioning that I have tried the solution to install PyMC again using mamba as proposed at PyMC V4 install issue: WARNING (Using NumPy C-API based implementation for BLAS functions.

Questions

  1. How do I fix all the warnings I’m getting on my Ubuntu machine? (I honestly believe I should just completely format my laptop and install Ubuntu from start, it feels that my installation always leaves me hanging)

  2. How do I fix the performance problem on my Windows machine?

Thanks a lot in advance!

Your second warning seems like an issue I saw from having both Aesara and Pytensor installled. Try removing the first if you are using PyMC v5.

3 Likes

Thanks!
I’ve edited the question to include that this has been solved.

I have also posted in stack overflow the remaining un-answered parts of the question.

Hopefully I’ll find a solution to speed up the linear algebra calculations on my Windows machine soon.

I think I fixed my issues for now… since it was complaining about aesara and pymc 5.0.0 uses pytensor i thought updating it would fix the issue…

After updateing I could see that I had pymc 5.0.0 installed on my conda env but still jupyter notebook was using pymc 4.0.0. I think the issue there was that my env was running on Python 3.11;

I then created a new kernel with python 3.8 and everything worked fine.

2 Likes