How to properly use split data with PyMC?

:sob: :sob: :sob: :sob: :sob: :sob: :sob: :sob:

First of all, thanks for the tip on how to have different sizes of train/test data — that works.

However, I am crying because the vector-form thing didn’t actually work… I believe that before it seemed to have worked because I reduced the size of X_train so much that doing by_hand or X @ beta would have around the same speed.

Now, again, its taking around 120 sec for the by_hand approach while it takes ~18min for the X @ beta. Maybe is because of the BLAS thing; please see below my remarks.

According to the benchmark given by check_blas.py for openblas/8 on a i7 950 it took 3.70s for “10 executions of gemm in float64 with matrices of shape 2000x2000”; while mine took around 10s for 10 executions of gemm with matrices of shape 5000x5000. So it seems to me that the timing is OK(?) Given that i7 950 is a bit worse than my AMD Ryzen 7 5800H and the matrices I tested were 2.5x bigger.

I do not have this flag, the only flag I have is -lblas. Should I install MKL? (even though my CPU is AMD?) Or should I install AML?

-------------------------------EDIT-------------------------------
I have upgraded my pymc to pymc>=5.0.0 and installed MKL. It looks like the vector form now is working properly but I’ll test it a few more times before I say for sure. It is worth mentioning that now doing the BLAS tests with both aesara and pytensor still doesn’t show the -lmkl_rt flag; and both tests take the same ~10 sec.

But now I get a lot of different warnings when importing and using pymc. Nonethless, I believe the main question of this thread have been answered and I’ll create a new thread with my other questions regarding MKL/BLAS and all these warnings.

Thanks once again for your very helpful assistance.