Optimizing matmul operations with PyTensor

Nobody asked for it, but I’ve recorded myself implementing a simple PyTensor rewrite, and it only took 2h30 haha (feel free to speedup- as I always do):

Perhaps you may learn a couple things or two about PyMC’s backend, or teach me something that I may have gotten very wrong.

PS: Audio is pretty crappy :slight_smile: