Hi everyone,
My master’s thesis project requires the parameter sampling of Physiologically-Based Pharmacokinetics (PBPK) ODE Models defined in SBML and solved using Roadrunner. This is achieved using the PyTensor Wrapper described in the examples.
My implementation can be found in this repo. The particular wrapper is defined here while the PyMC model is here.
The steps to set the environment and reproduce the results are outlined on the README
At the moment, the model is not able to accurately sample from the target distribution. I have tried the other available samplers with no success.
Could someone on the Discourse please take a look at the source code, reproduce it and let me know what it is the issue?
My hypothesis is that only a sampler with gradients like NUTS could lead to better results as the parameter space for the likelihood is quite complex and the non-gradient sampler results are quite biased
Thanks for any contribution! More than happy to answer any questions/comments about the documentation/reproducing the results
To give more details on the issues encountered during sampling, here is a description of the workflow when the user runs bayes_example.py
in the repo.
First, a set of parameters (i.e. absorption constant k
and clearance constant Cl
) are sampled from a log normal distribution. Let’s refer to them as the true parameters. In my run, these are:
k = [2.81502005, 1.53297557, 4.03010654, 1.12139799, 2.59723459]
CL = [1.63582935, 1.33484215, 3.31452979, 1.91527733, 1.13722572]
For each pair of values, a forward simulation is done using the SBML-defined ODE model and the Roadrunner solver. This results in a xarray.Dataset
with 2 dimensions: sim
(the amount of simulation i.e. 5) and time
(the step size for the Roadrunner solver i.e. 101).
This dataset is the one used for the Bayesian Model to estimate the true parameters. After the MCMC sampler is done, the summary statistics for my run are as follow:
Sampling 3 chains for 2_000 tune and 4_000 draw iterations (6_000 + 12_000 draws total) took 1098 seconds.6000/6000 06:02<00:00 Sampling chain 2, 0 divergences]
We recommend running at least 4 chains for robust computation of convergence diagnostics
median mad eti_3% eti_97% mcse_median ess_median ess_tail r_hat
k[0] 1.184 0.350 0.527 2.862 0.006 10802.915 9126.0 1.0
k[1] 1.142 0.347 0.498 2.788 0.007 11948.263 8823.0 1.0
k[2] 1.201 0.349 0.538 2.895 0.006 11661.139 7922.0 1.0
k[3] 1.124 0.346 0.481 2.726 0.007 11963.135 9009.0 1.0
k[4] 1.187 0.351 0.519 2.822 0.005 11831.302 8968.0 1.0
CL[0] 1.001 0.320 0.398 2.522 0.006 11818.290 9040.0 1.0
CL[1] 0.997 0.331 0.387 2.579 0.006 11709.851 8889.0 1.0
CL[2] 0.993 0.328 0.382 2.592 0.006 11906.557 8057.0 1.0
CL[3] 0.994 0.322 0.394 2.486 0.006 11381.412 8591.0 1.0
CL[4] 0.979 0.321 0.390 2.518 0.006 12063.211 9134.0 1.0
sigma 4.949 0.101 4.681 5.249 0.002 12108.402 8454.0 1.0
The median of the MCMC samples for any of the k
or CL
parameters do not match with the true parameters. That is the issue I am encountering with the model.
Thanks again for any feedback and please let me know if more details are needed.
What do the trace plots look like? What were the priors? If it’s just returning the priors, it could mean there is a bug. Your codebase is quite complex, but if you can give me a small snippet I can copy+paste to run locally and which that captures the problem (with imports and a data generation script), I can have a try.
Thanks Jesse for the interest. Indeed, the sampler seems to be drawing from the priors without taking the likelihood into account. Here are the trace plots:
You are right and the project is complex. I might be able to get a minimal running example so you can copy and paste. If you or anyone have time to clone the repo and set up the environment, it is simply running the bayes_example.py
script to reproduce the results