GSoC 2019 and pymc4 contribution

CoderINusE · February 24, 2019, 6:24pm

Hey everybody!

My name is Rasul and I’m a DS student at Skoltech, Russia. I really want to contribute to the pymc4 and work on it during this summer. I’ve seen couple issues and ideas on how the development can be progressed further. Still, I lack the knowledge on how the current architecture is going be improved and what are the global design issues right now. (and that is, I guess, the main part of the project for this GSoC)

Should I concentrate on the open issues for now or there is a better way to be involved in it?

ferrine · February 26, 2019, 7:54am

Hi, Rasul! Happy to see you there.

I think commenting your thoughts on our current issues and ongoing PRs is the best way to get involved. As you may see, pymc4 development is now a lot about testing new directions and discussions about design choices. It is a strategical decision to start with a well designed architecture to easily extent pymc4.

Global design issue is a lack of theano related “neat features” like theano.clone and tensorflow graph management: add only. Moreover tensorflow is moving forward to functional eager first api (like pytorch). That’s why we prioritize our efforts on functional design.

Goals are

Model is a function: data generative process
Models don’t change, but every modification returns a copy, like in Pandas.
Execute model function 1st time after configure
Make xarray a first-class citizen (used for input data as well as storage of samples)
Allow creation of submodules by treating a model like a RV

Challenges are:

debugging. How does one get a good error message before hi starts sampling? In pymc3 graph was built in runtime and the was no such issue. Now we have a delayed construction. (but we can run an inspection very first run)
inspection. How do we inspect pymc4 model like we did in pymc3? Setting smart starting values for sampling, Transform variables, etc
reparametrization. It is a common problem in hierarchical models to choose a parametrizations. e.g. cantered vs non-centered. We would like to create a unified way to do that
Variables as models. Some random variables are itself a composition of other variables (Horseshoe, WishartBartlett). The best way to deal with it is to treat ANY variable or model as a model with same API. This direction is not yet well explored. In my opinion it is the best design idea so far.

If you have any comments or clarifications from me or other developers, feel free to ask

Topic		Replies	Views
Gsoc 2019 \| Regarding project Help in creating the upcoming PyMC4 based on Tensorflow Probability Development	0	446	March 11, 2019
Getting started for GSOC 2019 Development	2	585	March 1, 2019
PyMC4 project for GSoC 2019 PyMC4	0	1499	February 19, 2019
Introduction \| Gsoc 2019 Development	0	483	March 10, 2019
Regarding Variational Inference Project for GSoC 2020 Development	10	1057	March 9, 2020

GSoC 2019 and pymc4 contribution

Related topics