PyMC3 and ODEs for GSoC

Hello!

I’m Demetri, a PhD student in Biostatistics in Ontario, Canada. My PhD research is in Bayesian Pharmacokinetics, and so naturally the ODE project in GSoC caught my attention.

I was prompted to apply by this tweet, and I realize time is a little limited. I’m reaching out to the community to hopefully gain some insight into the following:

  • What are the goals for this project? What is the minimal functionality envisioned from completing this project? What are nice to haves? This will help me develop the proposal.
  • My development is not as good as my stats and math, which is why I’m applying. What are some sources of advice for developing for PyMC3? I’ve already merged 3 small PRs (including a piece of documentation here).
  • I realize I’m cutting it a little close. Is it still worth it to submit a proposal?

Any insight is appreciated.

A little about me:

  • B.Sc and M.MATH were in applied math, so I am very comfortable with ODEs and their numerical solutions.
  • Bayesian mostly through Stan because it has ODE capabilities. Would love to help bring PyMC3 into this realm.
  • Website here: https://dpananos.github.io/
  • Github: https://github.com/dpananos
  • I think the feather in GitHub cap is GyMBo: Gym Monitoring Bot. Though not Bayesian, it is where I (slowly) practice being a developer,
1 Like

I’m interested but not knowledgable on the ODE project. I want to point you at https://docs.pymc.io/developer_guide.html, which is a very good resource for the internal workings of PyMC3.

It might be specifically useful to understand that the with pm.Model() as model: construction is not a common one in Python, but is useful here for various reasons.

Also, I encourage applying.

Thanks, I really appreciate the link.

Hi, as a disclaimer, I’m a more developer mentor than ODE:) Michael @michaelosthege has much more expertise than me in ODE side. That’s why I ask him to improve my answer)

I think the goal of the project is to develop an API for ODE modeling in PyMC3 as all the stuff is done by hand as you can see in ode related notebooks. Developing and designing the API takes a lot of time and we did not have that enough to concentrate on this particular problem. That’s why it is a GSoC project. We encourage creativity in students to design the API together and collaborate on this with us.

About “nice to haves”: first we need a clear API. Second is a fast solver (that caches intermediate results). The last one as I remember is “asynchronous observations” support (@michaelosthege can you comment on this more?)

This project is quite complicated from the developer perspective, but I think we can provide necessary expertise in theano internals and its tricks. If you know what is OOP that is at least a good start, following code reviews will help to understand the ways to go. This might be challenging and interesting. (I hope I do not scare you with this info:sweat_smile:)

@michaelosthege Can you comment on “asynchronous observations” ?

Hi @dpananos, I just added more details to the GSoC 2019 Projects Wiki page.

What I mean by “asynchronous observations”:
Take the Lorenz-Attractor for example (x,y,z). Now imagine that they are observed independently, so at some time t5, there is data for x and y, but not for z. At another time t6 there’s just data for z.

In all ODE example notebooks that I’ve seen so far, all states are observed at all timepoints, which is rarely the case with real data.

I’m pointing this out explicitly, because it makes a difference for the design of the Theano Ops.
It’s not too hard to deal with, but it would be best to have it in mind right away.

2 Likes