Hi @jessegrabowski, thanks for your continuous help.
Indeed, you were right once again. I downloaded two files
datatest.txt and I assumed they were different and was using one for
X_train, y_train and the other for
X_test, y_test. But now that I checked them closely they are the same…
I have deleted one of the files and just used the other one. I then did
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
And then ran the model again. Some points to note:
- either running
x[,1].. etc or
X_train.Hudmity... etc takes the same ~200 seconds for the
NUTS: [beta] part.
- when doing the split approach, I get an error when sampling
[obs]. The error reads:
ValueError: size does not match the broadcast shape of the parameters. (6514,), (6514,), (1629,). It is worth mentioning that
6514 is the length of the training data and
1629 of the test data. I would imagine it should be OK to have a smaller size of test data than training data?
Now, regarding the BLAS library tests, I am using pymc >= 4.0.0 and this is are the last lines of the test I get (on the second test, since it says to do it again):
We executed 10 calls to gemm with a and b matrices of shapes (5000, 5000) and (5000, 5000).
Total execution time: 10.09s on CPU (with direct Aesara binding to blas).
Should you need the entire output of the
check_blas.py test, please let me know!
Any ideas on what is happening?
PS.: Should I use
pymc >= 5.0.0 or … ?
The vectorized function now takes the same time as the other one. Something happend and its fixed. My only remaining question now is how to split the data in different sizes? I can only run the model if I have both
X_train with the same number of samples. As I showed above, if I split it in different sizes I get an error when sampling
Thanks a lot for all the help!