Meta-analytic predictive priors in PyMC

Harryg · May 22, 2024, 8:14am

The idea of meta-analytic predictive priors (MAP) is to use information from a historical data analysis (e.g. a previous clinical trial) for a new data analysis (e.g. a new clinical trial).
It is best described in these two publications:
Robust meta-analytic-predictive priors in clinical trials with historical control information - PubMed (nih.gov)
Applying Meta-Analytic-Predictive Priors with the R Bayesian evidence synthesis tools (arxiv.org)

The procedure is as follows:

use the summary statistics from one ore more previous trials to get parameter estimates
estimate a multivariate normal mixture from these parameters
use these parametric estimates as a prior for the new data analysis

additionally you can weight these priors with uniformative priors to let the new data have a stronger influence.

A full example in R can be found here: Applied Modelling in Drug Development - 6 Use of historical control data with stratification (nibr.com)

I tried to reproduce something similar in PyMC, but encountered the following problems:

I don’t know how to estimate parameters of a multivariate mixture, sklearn seems to support univariate mixtures
PyMC does not support multivariate Mixtures as priors as far as I know.

Has anyone ever done such an analysis in PyMC?

ricardoV94 · May 22, 2024, 11:51am

It does. There’s an example with Dirichlet in the code examples, but any multivariate distribution (at least MvNormal, MvStudentT) should be fine as well.

https://www.pymc.io/projects/docs/en/latest/api/distributions/generated/pymc.Mixture.html

Harryg · May 23, 2024, 9:45am

thanks, I wasn’t aware.
This is the model I came up with, but it fails to sample.

here is a minimal example

I want to model a linear regression where I set the priors to a multivariate mixture of normals

The estimated parameters should be the regression coefficients (and sigma)

# dummy data
RANDOM_SEED = 8927
rng = np.random.default_rng(RANDOM_SEED)
size = 200
true_intercept = 1
true_slope = 2
x = np.linspace(0, 1, size)
# y = a + b*x
true_regression_line = true_intercept + true_slope * x
# add noise
y = true_regression_line + rng.normal(scale=0.5, size=size)


with pm.Model() as model:
    components = [
        pm.MvNormal.dist(mu=[1, 0], cov=[[1, 1], [1, 1]]),
        pm.MvNormal.dist(mu=[1, 0], cov=[[1, -1], [-1, 1]])
    ]
    
    sigma = pm.HalfCauchy("sigma", beta=10)

    mix = pm.Mixture('mix', w=[0.5,0.5], comp_dists=components, shape=(2,))
    obs = pm.Normal('obs', mu=mix[0] * x + mix[1], sigma=sigma, observed=y)
    idata = pm.sample(3000)

edit:
If i use a simpler Mixture such as

components = [
    pm.Normal.dist(mu=0, sigma=1),
    pm.Normal.dist(mu=0, sigma=1),
]

the model runs fine and the parameters are correctly estimated.

However, I want to model the correlations between the covariates (intercept and x in this case)

ricardoV94 · May 23, 2024, 10:03am

What does fail to sample mean?

Harryg · May 23, 2024, 10:06am

I get this error:

File ~/miniconda3/envs/pymc/lib/python3.11/site-packages/pymc/model/core.py:1681, in Model.check_start_vals(self, start) 1678 initial_eval = self.point_logps(point=elem) 1680 if not all(np.isfinite(v) for v in initial_eval.values()): → 1681 raise SamplingError( 1682 “Initial evaluation of model at starting point failed!\n” 1683 f"Starting values:\n{elem}\n\n" 1684 f"Logp initial evaluation results:\n{initial_eval}\n" 1685 “You can call model.debug() for more details.” 1686 )

…

{‘sigma_log__’: array(1.80706588), ‘mix’: array([ 0.35557112, -0.56247942])} Logp initial evaluation results: {‘sigma’: -1.26, ‘mix’: -inf, ‘obs’: -562.89} You can call model.debug() for more details.

ricardoV94 · May 23, 2024, 10:09am

What does model.debug() output?

Harryg · May 23, 2024, 10:11am

this is the model.debug output:

/miniconda3/envs/pymc/lib/python3.11/site-packages/pytensor/tensor/slinalg.py:81: RuntimeWarning: invalid value encountered in cast
z[0] = (np.zeros(x.shape) * np.nan).astype(x.dtype)
The variable mix has the following parameters:
0: [0.5 0.5] [id A] <Vector(float64, shape=(2,))>
1: multivariate_normal_rv{1, (1, 2), floatX, False}.1 [id B] <Vector(float64, shape=(2,))>
├─ RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F8F825760A0>) [id C]
├─ [id D] <Vector(int64, shape=(0,))>
├─ 11 [id E] <Scalar(int64, shape=())>
├─ [1 0] [id F] <Vector(int64, shape=(2,))>
└─ [[1 1] [1 1]] [id G] <Matrix(int64, shape=(2, 2))>
2: multivariate_normal_rv{1, (1, 2), floatX, False}.1 [id H] <Vector(float64, shape=(2,))>
├─ RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F8F82577220>) [id I]
├─ [id D] <Vector(int64, shape=(0,))>
├─ 11 [id E] <Scalar(int64, shape=())>
├─ [1 0] [id F] <Vector(int64, shape=(2,))>
└─ [[ 1 -1] [-1 1]] [id J] <Matrix(int64, shape=(2, 2))>
The parameters evaluate to:
0: [0.5 0.5]
1: [1.95134106 0.95134106]
2: [ 2.04372036 -1.04372036]
This does not respect one of the following constraints: posdef

You can set verbose=True for more details

ricardoV94 · May 23, 2024, 10:17am

You defined MvNormal mixtures that don’t have a positive definite covariance.

Harryg · May 23, 2024, 11:15am

thanks, you are right. If I specify an identity matrix

components = [
    pm.MvNormal.dist(mu=[0, 0], cov=[[1, 0], [0, 1]]),
    pm.MvNormal.dist(mu=[0, 0], cov=[[1, 0], [0, 1]])
]

the parameters are correctly estimated

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
mix[0] 1.912 0.053 1.815 2.016 0.001 0.001 5500.0 6016.0 1.0
mix[1] 1.053 0.030 0.996 1.110 0.000 0.000 5447.0 6377.0 1.0
sigma 0.486 0.011 0.466 0.507 0.000 0.000 7515.0 7152.0 1.0

true intercept is 1.0 and true slope is 2.0

Topic		Replies	Views
Correctly specifying a model with a multimodal distribution version agnostic modeling	10	1085	May 14, 2022
Using a mixture of multivariate normal distributions as a prior Questions	16	1765	March 12, 2019
Prior Predictive Simulation from a multivariate normal distribution Questions	7	809	December 29, 2020
How to use the posterior distribution of one model as a prior distribution for another model	19	1456	April 9, 2024
Defining prior as mixture of two normal distribution	5	772	July 21, 2022

Meta-analytic predictive priors in PyMC

Related topics