The numpy broadcasting rules are actually a bit of a pain to work with in pytensor too. If you want to follow it strictly, you can’t tell until runtime what for instance a multiplication of two arrays will actually do. Depending on if a shape is one, it might broadcast, error out, or do elementwise multiplication. But which one happens has implications for the derivatives, because those won’t be the same in those cases.
If I could go back in time, I’d love to tell the numpy devs to not use a dimension length of 1 to indicate that something can be broadcast, but a dimension length None instead. I think that would have prevented a lot of strange bugs over the years…