Hierarchical ODEs

Hi all,

I’m wondering if anyone here has any experience, or ever eve heard about, hierarchical ordinary differential equation models?

For instance, say I have a longitudinal data set of N=500 with pre-patients and patients. Say we have a set of ODEs to describe the dynamics.

We then have various options, e.g.:

  • (A) Fit a single model to the data with one set of parameters.
  • (B) Fit two distinct models to the data (for pre-patients and patients) with each a fully independent set of parameters
  • (C) Fit 500 distinct models to the data for all individuals in the data with fully independent sets of parameters

but I’m guessing also (not sure how difficult it would be):

  • (D) Fit hierarchical ODE model with unique but dependent parameters for both groups

and possibly even:

  • (E) Fit hierarchical ODE model with unique but dependent parameters for all N=500 individuals in the data.

Does anyone have any experiences with such models? Is it worthwhile looking into them, or would it be best to stick to A&B&C? I think E would be the most interesting, as I’m looking to create a personalized disease model that works for a very heterogeneous population. However, my ODE model is already enormous (>100 parameters) so that does complicate things also for C and perhaps even for B.

That’s cool! I’m curious about your system, what are the ODE parameters/model?

D and E are possible, I’ve done that for smaller (~5 parameter) ODE models. It’ll definitely be slow, though, especially with 100-parameter ODE systems and even with variational inference, so make sure your ODE itself evaluates as quickly as possible (probably use JAX/jit/jacobians/etc).

You’ll probably need fairly informative priors, otherwise the likelihood will explode. I’d probably do (C) either with NLLS/MLE fitting or Bayesian, then use the values you get from that to define your priors for D/E. Even for simple ODEs I’ve run into problems with ADVI where it keeps running into regions that give a bunch of nans in the likelihood, and wasn’t able to find a way around it.

D might be unnecessary, if you have roughly equal membership in each class and there isn’t some part of the model that is better informed by one class vs the other. Your top-level parameters effectively have two observations, so they won’t be particularly informative.

A big issue with ODEs, particularly giant ones like that, is parameter identifiability. Many parameters can’t be uniquely determined from the data, often only ratios can. This paper gives a bit of an overview, but there’s a lot else out there. Fixing this involves reparameterizing your ODE or probabilistic models, which is non-trivial and a bit specific to the particulars of your system.

Good luck!

Thanks for your helpful response @BioGoertz!

The ODEs all describe various aspects related to dementia, such as cognitive decline, brain volume, and risk factors like blood pressure, etc.

You’re making a really good point regarding the identifiability. In fact, I think I’ll abandon this idea for now because I think it’s not really realist. I will opt for B for now. Good to know that D likely wouldn’t add much.

I hope to play around with a hierarchical version of a smaller set of ODEs sometime later!

Hi @biogoertz,
Thanks again for your excellent reply on this topic.

I’ve put some more thought into it and I’m wondering what you think about this workaround I have in mind. Hope you don’t mind me sending it by you. I’m very curious what you think.

As I said, I have a large system of ODEs. Actually it’s a system of differential algebraic equations, with around 10 differential (about 50-60 params) and 20 algebraic equations (about 80 params).

To make the parameter estimation more feasible, one thing I do is to run NUTS on the algebraic equations separately from the differential equations (as simple regressions) and then simply plug in the MAP parameters as deterministic (or the whole distributions as priors) when I’m running NUTS for the DEs . The latter procedure would then involve a numerical solver for the remaining ODE system.

However, even this shortcut is rather slow (even when implemented with jax odeint, vmap, jit, etc.), and indeed seems to have identifiability issues, so I’m looking for further shortcuts. Another one would be to even estimate the ODEs independently from one another by calculating slopes from the data and then estimating those, again using regression. Of course this would be a very rough approximation, but the results actually seem reasonable.

Now to come back to this topic. When I estimate the parameters in these equations separately like this, suddenly it seems feasible to use hierarchical regression (which for a random intercept model takes only 5 minutes per equation for 500 individuals). This would basically allow me to do (E) for all the parameters.

I’m curious what you think of this option? I would ultimately like to use this model for personalized medicine so hierarchical parameters would be very nice to have. That said, of course the ODE approximation is not very solid, but perhaps it’s good enough for my purposes (e.g., if it has decent predictive accuracy on a test set).

I’m also worried about how realistic the uncertainty is that i’m left with when I finally run the entire differential algebraic system (while sampling from all these different posteriors).I guess I’m throwing away uncertainty by calculating (and averaging) slopes, but I’m also throwing away information by estimating the equations separately.

I don’t really have the identifiability issue now, but it’s based on a bit of a trick (separate estimation) and I’m not sure if the trick is valid. However, since the model is based on a scientific conceptual model, I would ideally not do any model reduction. So from that perspective this may be the best option.