This might become a very long thread but I really need some help with this. The beginning of this discussion can be found in How to properly use split data with pymc. Thanks a lot @jessegrabowski for your help there!
I will divide the information I have gathered so far into two cases, one on my work laptop (Windows) and another one on my personal laptop (Ubuntu). It is also important to mention that, although I hate all the warnings I’m getting on my Ubuntu, performance-wise everything is working fine — however, my main interest in PyMC at the moment is for a project for work, therefore it is very important that everything runs smoothly on my Windows machine.
To the issue. All cases were tested in the same line of code:
with pm.Model() as model:
X = pm.MutableData('X', X_train)
y = pm.MutableData('y', y_train)
beta = pm.Normal('beta', mu=0, sigma=5, shape=len(X_train.columns))
constant = pm.Normal('constant', mu=0, sigma=5)
p = pm.Deterministic('p', pm.math.sigmoid(constant + X @ beta))
observed = pm.Bernoulli("obs", p, observed=y)
idata = pm.sample(3000)
with model:
pm.set_data({'X':X_test})
pm.set_data({'y':np.zeros_like(y_test)})
y_pred = pm.sample_posterior_predictive(idata)
General Info
It all started when my model (on my Ubuntu machine) was taking 15~18min to train when using the vector notation X @ beta
while it was taking 100~120 seconds to train if writing everything explicitly like X[:,0]*beta[0] + X[:,1]*beta[1] * ... * X[:,n]*beta[n]
.
Then, @jessegrabowski suggested me to see the check_blas.py
program within the pytensor
library and check for two things:
- How long was it taking for the test and to compare that with the benchmark provided by
check_blas.py
. - Check if in the line
blas__ldflags
had the-lmkl_rt
flag.
Now for the individual cases:
Ubuntu
My benchmark for check_blas.py
seemed fine. Around 10 seconds to execute 10 calls to geem with matrices of shapes (5000,5000)
. However, I did not have the -lmkl_rt
flag.
I then installed MKL
. Still, I didn’t have the -lmkl_rt
flag but now both the vector form and the explicit form were taking around 100~120 seconds, which is great.
However, after doing all of that I started getting a bunch of warnings, such as:
- when importing
pymc
/home/guin0x/.local/lib/python3.10/site-packages/pkg_resources/__init__.py:123:
PkgResourcesDeprecationWarning: 1.12.1-git20200711.33e2d80-dfsg1-0.6 is an invalid version
and will not be supported in a future release
warnings.warn(
/home/guin0x/.local/lib/python3.10/site-packages/pkg_resources/__init__.py:123:
PkgResourcesDeprecationWarning: 1.1build1 is an invalid version
and will not be supported in a future release
warnings.warn(
/home/guin0x/.local/lib/python3.10/site-packages/pkg_resources/__init__.py:123:
PkgResourcesDeprecationWarning: 0.1.43ubuntu1 is an invalid version
and will not be supported in a future release
warnings.warn(
2) when running the model (FIXED by uninstalling Aesara; thanks @ricardoV94 )
/home/guin0x/.local/lib/python3.10/site-packages/multipledispatch/dispatcher.py:27: AmbiguityWarning:
Ambiguities exist in dispatched function _unify
The following signatures may result in ambiguous behavior:
[ConstrainedVar, object, Mapping], [object, ConstrainedVar, Mapping]
[object, ConstrainedVar, Mapping], [ConstrainedVar, object, Mapping]
[object, ConstrainedVar, Mapping], [ConstrainedVar, Var, Mapping]
[object, ConstrainedVar, Mapping], [ConstrainedVar, Var, Mapping]
Consider making the following additions:
@dispatch(ConstrainedVar, ConstrainedVar, Mapping)
def _unify(...)
@dispatch(ConstrainedVar, ConstrainedVar, Mapping)
def _unify(...)
@dispatch(ConstrainedVar, ConstrainedVar, Mapping)
def _unify(...)
@dispatch(ConstrainedVar, ConstrainedVar, Mapping)
def _unify(...)
warn(warning_text(dispatcher.name, ambiguities), AmbiguityWarning)
Windows
After all that happened on my Ubuntu machine, I decided to test what would happen in my Windows machine. I first did the BLAS checks:
My benchmark for check_blas.py
seemed fine; around 12 seconds to execute 10 calls to geem with matrices of shapes (5000,5000)
. And I did have the -lmkl_rt
flag there.
Since the benchmark seemed good enough, and I had the -lmkl_rt
flag, I thought the model would train in a decent time. It wasn’t the case…
For the vector notation it took 689 seconds; while the explicit notation took 443 seconds. I would imagine that, if everything would have gone smoothly, this shouldn’t take so much more than 150 seconds (given that both laptops are very similar).
I think it is also worth mentioning that on my Windows machine I get a WARNING (aesara.tensor.blas): Using NumPy C-API based implementation for BLAS functions when importing arviz
; so it makes sense that is taking more time.
It is also worth mentioning that I have tried the solution to install PyMC
again using mamba
as proposed at PyMC V4 install issue: WARNING (Using NumPy C-API based implementation for BLAS functions.
Questions
-
How do I fix all the warnings I’m getting on my Ubuntu machine? (I honestly believe I should just completely format my laptop and install Ubuntu from start, it feels that my installation always leaves me hanging)
-
How do I fix the performance problem on my Windows machine?
Thanks a lot in advance!