I’m an Computer Engineering undergraduate student at Pune Institute of Computer Technology, Pune, India. I’m interested in contributing to the Replace backends with xarray project.I’ve been working in python since last 2 years. I’m familiar with general computing libraries in python such as pandas, numpy, scikit and also familiar jupyter notebook , git. I’d love to work on this. I went through backend files (base,ndarray,tracetab).
Tagging prospective mentors @RavinKumar and @colcarroll.
It may also be interesting to take a look at the discussions in Feature request: Named RV dimensions · Issue #4565 · pymc-devs/pymc3 · GitHub and shape vs size keyword argument · Issue #4552 · pymc-devs/pymc3 · GitHub.
As the project idea is not very detailed, I just want to add that the computational backend for pymc3>=4.0 is and will be Aesara, and implementing this new xarray backed backend will not change that. What I still personally don’t know is is how should we go about implementing that. I see two main paths we could take, each with its own pros and cons.
One option would be to have all sampling and calculations in pure Aesara, and use xarray to initialize (and preallocate) a dataset when pymc3 starts sampling and have it updated every iteration with the corresponding sampling results. I think this is not too different from what happens now with the current backends, and would have the pro of easily integrating with any xarray backed format, i.e. using dask backed xarray datasets we could probably sample models that don’t fit in memory “easily”.
Another option could be to integrate Aesara as a valid xarray data structure, so that we can do everything with xarray’s api but still using Aesara for the actual calculations. This approach is probably much more complicated to make work, but it could allow natively supporting operations with labeled dims and coords (as discussed in one of the issues above), even if it were only a subset, it would probably still be very powerful.
@OriolAbril Could you please glance at cons of first approach?
The main con would be that having labeled coords and dims would have to be implemented by us instead of using xarray, and the work that this would represent is not only creating but also maintaining it. Another maybe relevant issue: Add named dimensions · Issue #352 · pymc-devs/aesara · GitHub
Note however that all this (now and the post above) is what I expect, I have not tried any of the two approaches so this is basically a guess.