ODE Bayesian model not sampling from posterior distribution of parameters

Hi everyone,

My master’s thesis project requires the parameter sampling of Physiologically-Based Pharmacokinetics (PBPK) ODE Models defined in SBML and solved using Roadrunner. This is achieved using the PyTensor Wrapper described in the examples.

My implementation can be found in this repo. The particular wrapper is defined here while the PyMC model is here.

The steps to set the environment and reproduce the results are outlined on the README

At the moment, the model is not able to accurately sample from the target distribution. I have tried the other available samplers with no success.

Could someone on the Discourse please take a look at the source code, reproduce it and let me know what it is the issue?

My hypothesis is that only a sampler with gradients like NUTS could lead to better results as the parameter space for the likelihood is quite complex and the non-gradient sampler results are quite biased

Thanks for any contribution! More than happy to answer any questions/comments about the documentation/reproducing the results :smile:

To give more details on the issues encountered during sampling, here is a description of the workflow when the user runs bayes_example.py in the repo.

First, a set of parameters (i.e. absorption constant k and clearance constant Cl) are sampled from a log normal distribution. Let’s refer to them as the true parameters. In my run, these are:

k = [2.81502005, 1.53297557, 4.03010654, 1.12139799, 2.59723459]
CL = [1.63582935, 1.33484215, 3.31452979, 1.91527733, 1.13722572]

For each pair of values, a forward simulation is done using the SBML-defined ODE model and the Roadrunner solver. This results in a xarray.Dataset with 2 dimensions: sim (the amount of simulation i.e. 5) and time (the step size for the Roadrunner solver i.e. 101).

This dataset is the one used for the Bayesian Model to estimate the true parameters. After the MCMC sampler is done, the summary statistics for my run are as follow:

Sampling 3 chains for 2_000 tune and 4_000 draw iterations (6_000 + 12_000 draws total) took 1098 seconds.6000/6000 06:02<00:00 Sampling chain 2, 0 divergences]
We recommend running at least 4 chains for robust computation of convergence diagnostics

      median    mad  eti_3%  eti_97%  mcse_median  ess_median  ess_tail  r_hat
k[0]    1.184  0.350   0.527    2.862        0.006   10802.915    9126.0    1.0
k[1]    1.142  0.347   0.498    2.788        0.007   11948.263    8823.0    1.0
k[2]    1.201  0.349   0.538    2.895        0.006   11661.139    7922.0    1.0
k[3]    1.124  0.346   0.481    2.726        0.007   11963.135    9009.0    1.0
k[4]    1.187  0.351   0.519    2.822        0.005   11831.302    8968.0    1.0
CL[0]   1.001  0.320   0.398    2.522        0.006   11818.290    9040.0    1.0
CL[1]   0.997  0.331   0.387    2.579        0.006   11709.851    8889.0    1.0
CL[2]   0.993  0.328   0.382    2.592        0.006   11906.557    8057.0    1.0
CL[3]   0.994  0.322   0.394    2.486        0.006   11381.412    8591.0    1.0
CL[4]   0.979  0.321   0.390    2.518        0.006   12063.211    9134.0    1.0
sigma   4.949  0.101   4.681    5.249        0.002   12108.402    8454.0    1.0

The median of the MCMC samples for any of the k or CL parameters do not match with the true parameters. That is the issue I am encountering with the model.

Thanks again for any feedback and please let me know if more details are needed.

What do the trace plots look like? What were the priors? If it’s just returning the priors, it could mean there is a bug. Your codebase is quite complex, but if you can give me a small snippet I can copy+paste to run locally and which that captures the problem (with imports and a data generation script), I can have a try.

Thanks Jesse for the interest. Indeed, the sampler seems to be drawing from the priors without taking the likelihood into account. Here are the trace plots:

You are right and the project is complex. I might be able to get a minimal running example so you can copy and paste. If you or anyone have time to clone the repo and set up the environment, it is simply running the bayes_example.py script to reproduce the results