Thanks again for your excellent reply on this topic.
I’ve put some more thought into it and I’m wondering what you think about this workaround I have in mind. Hope you don’t mind me sending it by you. I’m very curious what you think.
As I said, I have a large system of ODEs. Actually it’s a system of differential algebraic equations, with around 10 differential (about 50-60 params) and 20 algebraic equations (about 80 params).
To make the parameter estimation more feasible, one thing I do is to run NUTS on the algebraic equations separately from the differential equations (as simple regressions) and then simply plug in the MAP parameters as deterministic (or the whole distributions as priors) when I’m running NUTS for the DEs . The latter procedure would then involve a numerical solver for the remaining ODE system.
However, even this shortcut is rather slow (even when implemented with jax odeint, vmap, jit, etc.), and indeed seems to have identifiability issues, so I’m looking for further shortcuts. Another one would be to even estimate the ODEs independently from one another by calculating slopes from the data and then estimating those, again using regression. Of course this would be a very rough approximation, but the results actually seem reasonable.
Now to come back to this topic. When I estimate the parameters in these equations separately like this, suddenly it seems feasible to use hierarchical regression (which for a random intercept model takes only 5 minutes per equation for 500 individuals). This would basically allow me to do (E) for all the parameters.
I’m curious what you think of this option? I would ultimately like to use this model for personalized medicine so hierarchical parameters would be very nice to have. That said, of course the ODE approximation is not very solid, but perhaps it’s good enough for my purposes (e.g., if it has decent predictive accuracy on a test set).
I’m also worried about how realistic the uncertainty is that i’m left with when I finally run the entire differential algebraic system (while sampling from all these different posteriors).I guess I’m throwing away uncertainty by calculating (and averaging) slopes, but I’m also throwing away information by estimating the equations separately.
I don’t really have the identifiability issue now, but it’s based on a bit of a trick (separate estimation) and I’m not sure if the trick is valid. However, since the model is based on a scientific conceptual model, I would ideally not do any model reduction. So from that perspective this may be the best option.