It seems like you profiled at least one source of the bottlenecks quite well. I am still relatively new to optimizing complex mathematical operations, so what would be the obvious way to potentially work on this particular bottleneck? Re-using the old memory?
I am genuinely wondering here. ODE functionality in Pymc3 is already quite amazing!