I’m using an erfcinv in a model and get a warning from initialize_fuseable_mappings to the effect that we can’t do loop fusion because there’s no C implementation available / used.
Do you have any feel for the kind of speedup one might get from this? I know that’s fairly impossible to guess, but are we talking fractional percentage gains in speed, or multiples?
I’ve not used JAX before and TBH am a little apprehensive of probably days of work to find & fix numerical issues etc (unfounded?)
But I do see JAX has an erfinv - and (without knowing anything about how it works), might it be possible in theory to JAXiify scipy.special.erfcinv too?
Those functions are automatically used by Pytensor jaxify machinery based on the names, there should be no need for you to do anything.
I mentioned it more as a benchmark to assess the max speedup you could hope for by doing a C impl.
Using numpyro/nutpie for sampling is likely to shift sampling speed much more than adding C code for this one Op, but that requires more care so I didn’t suggest that directly.
Scipy already has a C implementation of erfcinv here I think? I’m not sure what would be required to bring it into pytensor.
It seems like it should be possible to write a numba overload for it though, and use nutpie for a speedup. I’m surprised it’s not already overloaded in numba-scipy like most of the scipy.special module.
Scipy already has a C implementation of erfcinv here I think? I’m not sure what would be required to bring it into pytensor.
Just copy the files and make sure the Op loads the headers / references the function in the c_code method. It’s not too bad, but requires some trial and error.
Importantly that would be good for users in general, but I doubt it will fix any bottleneck in a specific model (probabilistically)
Thanks for sharing. I don’t know how easy it is to include GSL stuff for our Windows users, so my inclination would be to go with the scipy/cephes implementation.