How to properly use split data with PyMC?

In the “by hand” method, you’re not using the mutable data object in the computation of p. Are you positive those are out-pf-sample predictions? My guess is that the pm.set_data line actually does nothing in this case.

I’d check if the “by hand” method also slow if you write:

p = pm.Deterministic('p', pm.math.sigmoid(beta[0] + beta[1] * x[:, 0] + beta[2] * x[:,1] +  beta[3] * x[:,2] + beta[4] * x[:, 3]  + beta[5] * x[:, 4])

I’d also run the aesara BLAS test to check that your linear algebra libraries are all set up correctly. This can be done by runnning:

python `python -c "import os, aesara; print(os.path.dirname(aesara.__file__))"`/misc/check_blas.py

(If you’re on pymc>=5.0.0, replace aesara with pytensor everywhere in that command)