Sampling is very slow when using theano.scan in pymc3

Hi!

I use scan in my applications as well. It has a bad reputation, but there’s nothing “special” about scan that should cause it to be slower than anything else. In response to my own bellyaching about scan, one of the main Aesara devs wrote a very detailed post about how to benchmark, profile, and debug scan Ops. It might be worth having a look. It helped me get started with profiling my scan functions and tracking down the spots that cause bottlenecks.

It will also be useful to benchmark your scan function by itself first, so you can figure out if the problem is there, or somewhere in the PyMC part of the pipeline. For one application I just automatically assumed scan was a culprit, so I re-wrote everything in custom Ops with all looping inside numba code (I thought this would be faster) and sampling was still slow, because my model was complex and just difficult to sample from.

Without knowing more about what you’re specifically doing in scheme it’s impossible to say, but I hope this can start to point you in the right direction.