Very slow 2d convolution with Pytensor

pbaggens · May 13, 2023, 6:40pm

As I understand it, pytensor is a fork of aesara, which is a fork of theano. I have written scripts that convert theano code to either aesara or pytensor. I have a 2d convolution problem which gets the following execution time for the same problem (same code) Theano-GPU (0.7 sec), Aesara-CPU 3.4 sec, Pytensor-CPU 37 sec. This is practically unusable for me. So as Theano gets further forked, processing time gets further f**ked. I understand the technical reasons, aesara dropped the gpuarray backend, and pytensorr seems to have dropped the GEMM/Blas optimizations. In fact, I dug down deep and it seems pytensor is using convolution in abstract_conv.py, which is just a bunch of nested python loops. No wonder it is so slow. So my question is, is there any plan to add optimizations to Conv Ops?

ricardoV94 · May 13, 2023, 8:26pm

PyTensor didn’t drop BLAS/GEMM support and your difference in performance from Aesara is very unexpected. Can you confirm PyTensor can see the Blas bindings? This often fails when installing via pip instead of conda-forge.

If you can share a minimal code someone can try and replicate it. If performance deteriorated that much we would consider it a bug an try to fix it.

Regarding GPU, that was deemed too challenging to support properly with our limited resources. However, PyTensor functions can be easily transpiled into JAX which has great native support for GPUs and TPUs

Just pass mode="JAX" to pytensor.function

pbaggens · May 14, 2023, 5:58am

Many thanks for the info and suggestions. That gives me hope and I’ll track down the problem with BLAS/Gemm. I did use mode=JAX, and I do have jax[gpu] correctly installed, but it complained that JAX-ified Ops are not available for many of the Ops I use. I’ll report back. Thanks again

ricardoV94 · May 14, 2023, 8:42am

Feel free to open an issue on our repository listing any missing JAX Ops that you need: GitHub - pymc-devs/pytensor: PyTensor is a fork of Aesara -- a Python library for defining, optimizing, and efficiently evaluating mathematical expressions involving multi-dimensional arrays.

You may even choose to contribute the functionality yourself. Here is a recent example that shows how easy it can be: Add jax implementation of `pt.linalg.pinv` by jessegrabowski · Pull Request #294 · pymc-devs/pytensor · GitHub

pbaggens · May 16, 2023, 8:02am

So, I re-installed PyTensor with conda -c conda-forge, and yes, I see now that the c-compiler and all dependencies are properly installed including (I think) GEMM/Blas And, I have seen the speedup of the execution time for the case without 2D convolution. But, for 2D convolution, there is still a huge difference. I notice that in AESARA, there is aesara/tensor/nnet/conv.py which has optimized convolution drawn from
from scipy.signal._sigtools import _convolve2d
and implemented as an OpenMP Op.

But in pytensor, this has been apparently removed. The convolution appears to be done brute-force in abstract_conv.py
Am I missing something here?
Thanks in advance

ricardoV94 · May 17, 2023, 7:36am

Yes, you’re correct. Even in Aesara it is marked as deprecated: aesara/__init__.py at 4b266671f5b60ba2cf9e435248febfa07f632824 · aesara-devs/aesara · GitHub

In general we are trying to move away from the old C_code to minimize maintenance burden and betting on the Numba/JAX backends.

But we did want to keep the Conv Ops around because they are still so useful. I think there was an error in this PR: Remove deprecated modules by ferrine · Pull Request #111 · pymc-devs/pytensor · GitHub

Where we kept the abstract_conv, but not the conv module and the rewrites that replace the former (slow) by the latter (fast). I will open an issue on our repo to bring them back. Thanks for flagging it!

ricardoV94 · May 17, 2023, 7:48am

Issue: Efficient Conv Ops were removed accidentally · Issue #305 · pymc-devs/pytensor · GitHub

pbaggens · May 19, 2023, 6:37am

Thanks, that would be helpful.
Also, not that there is a bug in the existing abstract_conv.py, where I had to add the cast in line 682:

filters = pytensor.tensor.cast(filters,dtype=input.dtype)

Without this, abstract_conv will upcast data to float64, even if the input and outout
want to have float32.

ricardoV94 · May 19, 2023, 7:13am

Do you want to open a pull request to fix that in the source code?

pbaggens · May 19, 2023, 7:31am

Hi, so far I have never done a pull request. Not sure how. I’ll read up on that, or I don’t mind if you add that one line for me #:^)

ricardoV94 · May 19, 2023, 8:27am

Sure, in case you’re interested we have a tutorial for PyMC, should be similar for PyTensor: Pull request step-by-step — PyMC 5.3.1 documentation

pbaggens · May 19, 2023, 8:51am

OK, I’ll read up, but not sure when I will have time. Not sure I understood, can you do that 1 line for me? I’d appreciate that.

ricardoV94 · May 19, 2023, 8:55am

If I find the time

pbaggens · September 13, 2024, 11:22am

I’d like to comment on this after a long pause. The updates by jessegrabowski have versions of conv2d that work on GEMM in the “tests” directory. I import them using, for example:
from tests.tensor.conv.test_abstract_conv import conv2d_corr as conv2d
from tests.tensor.conv.test_abstract_conv import conv2d_corr_gi
I have tried these out, and works SUPER! for neural networks and I have tested them extensively . This is a huge help, THANKS!!. They’re orders of magnitude faster that the old abstract_conv.py. But, they are not yet part of standard PYTENSOR. What are the plans for these functions?

jessegrabowski · September 13, 2024, 11:43am

We want to get them back into the codebase but not in a haphazard way. It’s basically waiting for someone to come along and invest time into it.

pbaggens · September 13, 2024, 12:32pm

Great. I have the desire but not the knowledge. I’m a user, not a programmer. otherwise I’d do it. But I must say, that with these faster functions, Pytensor is again a good option for neural networks. In my opinion, better than Tensorflow or Pytorch.

jessegrabowski · September 14, 2024, 4:56am

I agree that these functions should be back in, it’s just not realistic for me to work on it until at least after the new year.

An interim solution would be to clone the PR branch where I restore the functions (as it sounds like you did), but then rebase it onto main. Then you’ll essentially be at the most recent version with these functions “restored”. I don’t think the rebase will break anything (since these functions live off to the side and aren’t implicated in any changes), but these are famous last words.

Topic		Replies	Views
PyMC is Forking Aesara to PyTensor News development , aesara	7	1318	December 16, 2022
UnaryScalarOps for JAX?	3	241	May 19, 2023
Theano Op using JAX for lightning-fast ODE inference Sharing	4	2798	April 26, 2021
Aesara, theano, theano-pymc Development	2	1976	May 27, 2021
Diagnosing Pymc v4 slow sampling - linux, kubernetes notebook aesara	2	569	July 19, 2022

Very slow 2d convolution with Pytensor

Related topics