Hi everyone!

I’m a computational modeler but relatively new to Bayesian methods and PyMC3.

I was hoping to pick your brain on a problem I’m working on:

(I’m new to the forum so I’m sorry if a similar question was asked before, I haven’t been able to find it yet.)

I have a set of coupled first-order ODE’s (twelve or so). They depend on one another, as well as a set of exogenous variables that I model as constants.

They are mostly in the form of dY(t)/dt = par_1 * Y(t) + par_2* X(t) + par_3 * Z(t) + par_4 * exo_var + …

. The ODEs are therefore linear, but they may become nonlinear if I allow interaction terms during model selection at a later point. The total number of uncertain parameters in the model (par_i) goes up to about 70.

I have some data for most of the variables (500 individuals approx, with several data points over time) to calibrate these equations, but with lots and lots of missing data. And of 1-2 variables, I have no data at all. Luckily, as I understand it, PyMC3 automatically performs Bayesian imputation, and I can simply assign a prior distribution to the variables that I do not have data of and then leave them completely unseen.

Now I wish to estimate the posterior distribution and run simulations with this model that makes reasonably accurate predictions. First of all, I’m wondering: is this feasible at all? Or do you believe I need to make additional assumptions to reduce the parameter space?

Second, I’m aware that there’s a API for ODEs, but as I understand it, it’s quite slow with high-dimensional ODEs. How would you suggest I best implement this model as to make it run efficiently with the NUTS sampler?

Finally, on top of all this, as I mentioned, I also have uncertainties regarding the form of the model (e.g., should I add interaction terms, such as par * Y(t) * X(t) ). What kind of model selection procedure would you recommend I could use together with or in PyMC3? I have looked into Bayesian model averaging but that seems to require that I estimate posterior distributions for every model, which is infeasible as the number of possible model specifications are very large as well.

I would really appreciate any thoughts on this (ambitious) project!