GSOC 2021 Project Discussion

Rashmi_Borase · March 28, 2021, 11:08am

I’m an Computer Engineering undergraduate student at Pune Institute of Computer Technology, Pune, India. I’m interested in contributing to the Replace backends with xarray project.I’ve been working in python since last 2 years. I’m familiar with general computing libraries in python such as pandas, numpy, scikit and also familiar jupyter notebook , git. I’d love to work on this. I went through backend files (base,ndarray,tracetab).

Thank you
Rashmi Borase

OriolAbril · March 28, 2021, 12:02pm

Tagging prospective mentors @RavinKumar and @colcarroll.

It may also be interesting to take a look at the discussions in Feature request: Named RV dimensions · Issue #4565 · pymc-devs/pymc3 · GitHub and shape vs size keyword argument · Issue #4552 · pymc-devs/pymc3 · GitHub.

As the project idea is not very detailed, I just want to add that the computational backend for pymc3>=4.0 is and will be Aesara, and implementing this new xarray backed backend will not change that. What I still personally don’t know is is how should we go about implementing that. I see two main paths we could take, each with its own pros and cons.

One option would be to have all sampling and calculations in pure Aesara, and use xarray to initialize (and preallocate) a dataset when pymc3 starts sampling and have it updated every iteration with the corresponding sampling results. I think this is not too different from what happens now with the current backends, and would have the pro of easily integrating with any xarray backed format, i.e. using dask backed xarray datasets we could probably sample models that don’t fit in memory “easily”.

Another option could be to integrate Aesara as a valid xarray data structure, so that we can do everything with xarray’s api but still using Aesara for the actual calculations. This approach is probably much more complicated to make work, but it could allow natively supporting operations with labeled dims and coords (as discussed in one of the issues above), even if it were only a subset, it would probably still be very powerful.

Rashmi_Borase · March 28, 2021, 5:26pm

@OriolAbril Could you please glance at cons of first approach?

OriolAbril · March 28, 2021, 8:07pm

The main con would be that having labeled coords and dims would have to be implemented by us instead of using xarray, and the work that this would represent is not only creating but also maintaining it. Another maybe relevant issue: Add named dimensions · Issue #352 · pymc-devs/aesara · GitHub

Note however that all this (now and the post above) is what I expect, I have not tried any of the two approaches so this is basically a guess.

Topic		Replies	Views
GSOC 2021 Replace backends with xarray Development	2	500	April 9, 2021
Participating in GSOC22 Development	4	567	March 21, 2022
Alternative Computation Backends for PyMC PyMC4	5	2292	June 7, 2018
PyMC is Forking Aesara to PyTensor News development , aesara	7	1320	December 16, 2022
Online Meetup: PyMC, Aesara and Aeppl: The New Kids on The Block (Jul 7, 2022) Events community	8	779	July 12, 2022

GSOC 2021 Project Discussion

Related topics