Very slow 2d convolution with Pytensor

PyTensor didn’t drop BLAS/GEMM support and your difference in performance from Aesara is very unexpected. Can you confirm PyTensor can see the Blas bindings? This often fails when installing via pip instead of conda-forge.

If you can share a minimal code someone can try and replicate it. If performance deteriorated that much we would consider it a bug an try to fix it.

Regarding GPU, that was deemed too challenging to support properly with our limited resources. However, PyTensor functions can be easily transpiled into JAX which has great native support for GPUs and TPUs

Just pass mode="JAX" to pytensor.function