Computer model calibration using a Gaussian Process: From stan to pymc

giorgp · May 22, 2023, 4:50pm

I am in the process of translating models I have written in stan into pymc. I am not sure what the most efficient way to implement the following model is:

functions {
  vector biased_model(vector ach_v, real P, real K, int N){
    vector[N] model_out;
    model_out = ((ach_v * P) ./ (ach_v + K)) + (ach_v * 0.015);
    return model_out;
  }
}


// The input data is a vector 'y' of length 'N'.
data {
  int<lower=0> N; // Number of observations
  vector[N] y; // Observations of I/O ratio
  real ach[N]; // ACH measurements
}

transformed data{
  vector[N] ach_v = to_vector(ach);
}

parameters {
  real<lower=0> sigma;
  real<lower=0> P;
  real<lower=0> K;
  real<lower=0> rho;
  real<lower=0> alpha;
}

// The model to be estimated. We model the output
// 'y' to be normally distributed.
// The mean of this distribution is the result of
// applying the steady state equation on the measured ACH values
// given the *sampled* parameters P and K. 
// The standard deviation is the noise captuered by sigma.
model {
  matrix[N, N] L_K;
  matrix[N, N] K_cov = gp_exp_quad_cov(ach, alpha, rho);
  real sq_sigma = square(sigma);

  // diagonal elements
  for (n in 1:N) {
    K_cov[n, n] = K_cov[n, n] + sq_sigma;
  }

  L_K = cholesky_decompose(K_cov);
  
  
  // priors
  rho ~ inv_gamma(5, 5);
  alpha ~ normal(0,2);
  P ~ normal(0.85, 0.2);
  K ~ normal(0.12, 0.05);
  sigma ~ exponential(10);
  
  y ~ multi_normal_cholesky(biased_model(ach_v, P, K, N), L_K);
}

What I implemented in the above is an example of Bayesian computer model calibration, as described in Higdon et al. (2004), where the computer model is assumed to be fast and biased. The purpose of the GP model is to capture the discrepancy (\delta) that exists between the true noisy observations (y) and computer model (\eta):

y = \eta(x, \theta) + \delta(x) + \epsilon

where x represents the known explanatory variables, \theta represents the uknown calibration parameters, and \epsilon is noise assumed to idd.

I did some reading on the pymc case studies and on the forum, and thought that a marginal likelihood may be the fastest, although, I’m not sure how to best implement it. I had a go as shown below, but please let me know if this is sensible.

import numpy as np
import pandas as pd
import pymc as pm

# Assume that the "true" but unknown values of P and K are
P_true = 0.8 #penetration efficiency 
K_true = 0.10 #deposition rate (h-1)

# Create data
data = pd.DataFrame({"measured_ach": np.random.uniform(0.75, 1.25, 250)})
data["measured_i_o"] = data["measured_ach"].apply(lambda ach: model(ach, P=P_true, K=K_true))
sigma_par = 0.01
data["measured_i_o_noisy"] = data["measured_i_o"] + np.random.normal(loc=0, scale=sigma_par, size=len(data))

# Subset data for training
N = 49
y_train = data.loc[0:N,'measured_i_o_noisy'].to_numpy()[:, None]
ach_train = data.loc[0:N,"measured_ach"].to_numpy()[:, None]

class BiasedModel(pm.gp.mean.Mean):
    def __init__(self, P, K, ach):
        self.P = P
        self.K = K
        self.ach = ach

    def __call__(self, ach):
        return ((ach * P) / (ach + K) + (ach*0.015))

with pm.Model() as model_bias_marginal1:
    # Assume exponential prior with rate (or inverse scale) of 10
    sigma = pm.Exponential('sigma', lam=10)
    # Assume normal prior with mean of 0.85, standard deviation of 0.2
    P = pm.TruncatedNormal('P', mu=0.85, sigma=0.2, lower=0)
    # Assume normal prior with mean of 0.12, standard deviation of 0.05
    K = pm.TruncatedNormal('K', mu=0.12, sigma=0.05, lower=0)
    
    # Note, in pymc doc rho is equivalent to l
    rho = pm.InverseGamma("rho", alpha=5, beta=5)
    eta = pm.HalfNormal("eta", sigma=2)

    # Covariance function for the model bias GP
    cov = eta**2 * pm.gp.cov.ExpQuad(1, rho)
    
    # Specify biased model output as mean to the GP marginal
    gp = pm.gp.Marginal(mean_func= BiasedModel(P, K, ach_train), cov_func=cov)

    likelihood = gp.marginal_likelihood('y_obs', sigma=sigma, y=y_train, X=ach_train)

with model_bias_marginal1:
    # draw 500 posterior samples
    idata_model_bias_marginal1 = pm.sample(tune=500, draws=500, cores=2, chains=2, random_seed=456)

Another approach, which is likely slower, but seems more similar to my original model is:

def BiasedModel(ach, P, K):
    return ((ach * P) / (ach + K)) + (ach*0.015)

# MvNormal likelihood implementation
with pm.Model() as model_bias_mvn:
    
    ###### Parameter Priors ##########
    # Assume exponential prior with rate (or inverse scale) of 10
    sigma = pm.Exponential('sigma', lam=10)
    # Assume normal prior with mean of 0.85, standard deviation of 0.2
    P = pm.TruncatedNormal('P', mu=0.85, sigma=0.2, lower=0)
    # Assume normal prior with mean of 0.12, standard deviation of 0.05
    K = pm.TruncatedNormal('K', mu=0.12, sigma=0.05, lower=0)
    
    ###### Hyperparameter Priors ######
    # Note, in pymc doc rho is equivalent to l
    rho = pm.InverseGamma("rho", alpha=5, beta=5)
    eta = pm.HalfNormal("eta", sigma=2)

    ###### Model prediction #####
    mu = pm.Deterministic("mu", BiasedModel(ach_train, P, K))
    
    # Define covariance function for the model bias and noise
    cov_delta = eta**2 * pm.gp.cov.ExpQuad(1, rho) 
    cov_epsilon = pm.gp.cov.WhiteNoise(sigma)
    
    # Construct single covariance matrix
    Sigma_comb = cov_delta(X=ach) + cov_epsilon(X=ach)

    y_obs = pm.MvNormal('y_obs', mu=mu, cov=Sigma_comb, observed=y_train)

with model_bias_mvn:
    # draw 250 posterior samples
    idata_model_bias_mvn = pm.sample(tune=500, draws=250, cores=2, chains=2, random_seed=456)

My questions:

Any advice on how to best define this model in pymc? Is there an alternative formulation that you would recommend?
Am I missing something in either of my attempts that could speed things up, or do either of them seem wrong to you?
In particular, with regard to my second attempt, can I take advantage of cholesky decomposition (or is that already implemented within the MvNormal function)?

I appreciate that I am asking a lot of questions, and providing me with an answer to any of them would be much appreciated!

bwengals · May 26, 2023, 7:16pm

For 1, It looks like your ach measurements are one dimensional (like a time series). I think you could get a massive speedup then using the HSGP approximation. Unfortunately there isn’t an example of it yet in the docs, but check the docstrings there for usage. It’s basically a drop in replacement for gp.Latent (when your inputs are low dimensional, 1 or 2, maybe 3, and your covariance is one of ExpQuad, Matern52, Matern32, currently).

For 2 and 3, no, nothing jumps out to me as being slow. The way you implemented it directly using MvNormal is essentially what pm.gp.Marginal does under the hood, though the latter takes care of a few more minor implementation details (adding jitter, doing predictions). I would assume they run about the same speed, and MvNormal is using the Cholesky factor to calculate the log probability.

Also keep in mind that using better and/or more informative priors can also dramatically speed up sampling.

giorgp · May 26, 2023, 7:35pm

Thanks for your reply and your suggestions. I will look into HSGP in the next few days.

Over the last couple of days, I have been working on this with the following basic formulation:

# Defining a custom mean to represent the biased model predictions
class BiasedModel(pm.gp.mean.Mean):
    def __init__(self, P, K):
        self.P = P
        self.K = K

    def __call__(self, X):
        return (X * self.P) / (X + self.K) + (X*0.015)

N = 49

y_train = data.loc[0:N,'measured_i_o_noisy'].to_numpy()[:, None]
ach_train = data.loc[0:N,"measured_ach"].to_numpy()[:, None]


with pm.Model() as model_bias_marginal1:
    # Assume exponential prior with rate (or inverse scale) of 10
    sigma = pm.Exponential('sigma', lam=10)
    # Assume normal prior with mean of 0.85, standard deviation of 0.2
    P = pm.TruncatedNormal('P', mu=0.85, sigma=0.2, lower=0)
    # Assume normal prior with mean of 0.12, standard deviation of 0.05
    K = pm.TruncatedNormal('K', mu=0.12, sigma=0.05, lower=0)
    
    # Note, in pymc doc rho is equivalent to l
    rho = pm.InverseGamma("rho", alpha=5, beta=5)
    eta = pm.HalfNormal("eta", sigma=1)

    # Covariance function for the model bias GP
    cov = eta**2 * pm.gp.cov.ExpQuad(1, rho)
    
    gp = pm.gp.Marginal(mean_func= BiasedModel(P=P, K=K), cov_func=cov)

    likelihood = gp.marginal_likelihood('y_obs', sigma=sigma, y=y_train, X=ach_train)

with model_bias_marginal1:
    # draw 1000 posterior samples
    idata_model_bias_marginal1 = pm.sample(tune=1000, draws=500, cores=2, chains=2, random_seed=456)

### Convergence ####
# Convergence check 1: Visually inspecting traceplots
az.plot_trace(idata_model_bias_marginal1, var_names = ["P", "K", "sigma"], combined=True);
# Convergence check 2: Rhat and n_eff
az.summary(idata_model_bias_marginal1, var_names = ["P", "K", "sigma", "rho", "eta"], round_to=2)

With the true values being,

P = 0.8 
K = 0.10 
sigma = 0.01

After having tried changing the priors, increasing the number of observations and standardising the BiasedModel output and observations, I always get sigma to be zero instead of 0.01. I even tried using the WhiteNoise cov although I understood it to be the same as specifying sigma in gp.marginal_likelihood.

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
P	0.782	0.012	0.759	0.805	0.001	0.000	324.510	367.917	1.011
K	0.096	0.017	0.065	0.129	0.001	0.001	300.704	379.986	1.011
sigma	0.000	0.000	0.000	0.000	0.000	0.000	294.313	291.642	1.008
rho	41.408	18.022	17.080	71.489	0.997	0.845	602.177	481.357	1.002
eta	0.011	0.001	0.009	0.013	0.000	0.000	575.593	367.292	1.009

P and K are not that bad, but sigma is always 0. Running the equivalent model in stan resulted in the following estimates:

       mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
P     0.80       0 0.11 0.59 0.75 0.78 0.84  1.04  1506 1.00
K     0.10       0 0.04 0.03 0.07 0.10 0.13  0.20   836 1.01
sigma 0.01       0 0.00 0.01 0.01 0.01 0.01  0.01   206 1.02

So far, I assumed that this was due to a mistake in my pymc formulation. Could it be something else? I had randomly generated the data in R (for stan) and Python for these tests, but while using the same boundary from a uniform distribution. Could this difference be due to random variability alone?

bwengals · May 30, 2023, 4:22pm

Nothings jumping out to me in your PyMC code that would cause sigma to be underestimated relative to the Stan version, they look the same to me but I might be missing something. I tried running your example but I get error on this line: data["measured_i_o"] = data["measured_ach"].apply(lambda ach: model(ach, P=P_true, K=K_true)). Would it be possible to try with a larger sigma? Might make it easier to compare.

giorgp · June 7, 2023, 9:12am

Apologies for providing an incomplete example, I somehow left the definition of the model out. Please see below for a complete example:

import numpy as np
import pandas as pd
import pymc as pm
import arviz as az
import matplotlib.pyplot as plt

def model(ach, P, K):
    i_o = ((ach*P)/(ach+K))
    return i_o

# Assume that the "true" but unknown values of P and K are
P_true = 0.8 #penetration efficiency 
K_true = 0.10 #deposition rate (h-1)

# Create data
data = pd.DataFrame({"measured_ach": np.random.uniform(0.75, 1.25, 250)})
data["measured_i_o"] = data["measured_ach"].apply(lambda ach: model(ach, P=P_true, K=K_true))
sigma_par = 0.01
data["measured_i_o_noisy"] = data["measured_i_o"] + np.random.normal(loc=0, scale=sigma_par, size=len(data))

# Plot the data
# Add bias to the measured_i_o_noisy column
data['measured_i_o_noisy_bias'] = data['measured_i_o'] + data['measured_ach'] * 0.015

# Plot the data
plt.figure(figsize=(10, 6))
plt.scatter(x=data['measured_ach'], y=data['measured_i_o'], label='True', color="black")
plt.scatter(x=data['measured_ach'], y=data['measured_i_o_noisy'], label='Noisy', color="red")
plt.scatter(x=data['measured_ach'], y=data['measured_i_o_noisy_bias'], label='True + Bias', color="blue")
plt.xlabel('Measured ACH [1/h]')
plt.ylabel('Measured I/O Ratio')
plt.legend(loc='upper left')
plt.show()


# Defining a custom mean to represent the biased model predictions
class BiasedModel(pm.gp.mean.Mean):
    def __init__(self, P, K):
        self.P = P
        self.K = K

    def __call__(self, X):
        return (X * self.P) / (X + self.K) + (X*0.015)

N = 49

y_train = data.loc[0:N,'measured_i_o_noisy'].to_numpy()[:, None]
ach_train = data.loc[0:N,"measured_ach"].to_numpy()[:, None]


with pm.Model() as model_bias_marginal1:
    # Assume exponential prior with rate (or inverse scale) of 10
    sigma = pm.Exponential('sigma', lam=10)
    # Assume normal prior with mean of 0.85, standard deviation of 0.2
    P = pm.TruncatedNormal('P', mu=0.85, sigma=0.2, lower=0)
    # Assume normal prior with mean of 0.12, standard deviation of 0.05
    K = pm.TruncatedNormal('K', mu=0.12, sigma=0.05, lower=0)
    
    # Note, in pymc doc rho is equivalent to l
    rho = pm.InverseGamma("rho", alpha=5, beta=5)
    eta = pm.HalfNormal("eta", sigma=1)

    # Covariance function for the model bias GP
    cov = eta**2 * pm.gp.cov.ExpQuad(1, rho)
    
    gp = pm.gp.Marginal(mean_func= BiasedModel(P=P, K=K), cov_func=cov)

    likelihood = gp.marginal_likelihood('y_obs', sigma=sigma, y=y_train, X=ach_train)

with model_bias_marginal1:
    # draw 1000 posterior samples
    idata_model_bias_marginal1 = pm.sample(tune=1000, draws=500, cores=2, chains=2, random_seed=456)

### Convergence ####
# Convergence check 1: Visually inspecting traceplots
az.plot_trace(idata_model_bias_marginal1, var_names = ["P", "K", "sigma"], combined=True);
# Convergence check 2: Rhat and n_eff
az.summary(idata_model_bias_marginal1, var_names = ["P", "K", "sigma", "rho", "eta"], round_to=2)

I tried the suggestion of varying \sigma, and what I found is interesting, at least to me.

True \sigma = 0.01:

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
P	0.77	0.01	0.75	0.79	0.00	0.00	445.13	390.49	1.01
K	0.08	0.01	0.05	0.10	0.00	0.00	439.76	403.88	1.01
sigma	0.00	0.00	0.00	0.00	0.00	0.00	458.21	333.47	1.00
rho	33.39	16.53	13.38	58.06	0.76	0.56	586.18	495.53	1.00
eta	0.01	0.00	0.01	0.01	0.00	0.00	508.92	482.28	1.00

True \sigma = 0.05:

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
P	0.79	0.03	0.72	0.84	0.00	0.00	219.98	148.02	1.01
K	0.10	0.04	0.02	0.18	0.00	0.00	218.65	239.43	1.01
sigma	0.00	0.00	0.00	0.00	0.00	0.00	325.92	163.95	1.00
rho	207.92	90.94	89.19	353.05	4.58	3.24	341.75	351.22	1.01
eta	0.06	0.01	0.04	0.07	0.00	0.00	407.19	427.03	1.00

True \sigma = 0.10:

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
P	0.79	0.04	0.72	0.86	0.00	0.0	294.56	391.29	1.0
K	0.12	0.05	0.03	0.22	0.00	0.0	252.07	225.04	1.0
sigma	0.00	0.00	0.00	0.00	0.00	0.0	698.40	403.64	1.0
rho	389.40	166.82	143.72	699.65	7.12	5.4	670.56	466.75	1.0
eta	0.11	0.01	0.09	0.12	0.00	0.0	748.37	658.60	1.0

In summary, from the results:

By increasing the true value of \sigma, \eta and \rho (equiv to l) increase.
The mean of the posterior of \eta seems to relate to the true value of \sigma
The posterior of \sigma remains zero.

twiecki · June 8, 2023, 10:41am

That looks very odd. What happens when you play around with the prior for sigma? E.g. using a half-normal.

giorgp · June 8, 2023, 12:48pm

Not a lot changes. Assuming \sigma \sim HalfNormal(0.2):

True \sigma = 0.01:

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
P	0.78	0.01	0.76	0.81	0.00	0.00	441.82	368.99	1.01
K	0.09	0.02	0.06	0.13	0.00	0.00	449.79	424.92	1.01
sigma	0.00	0.00	0.00	0.00	0.00	0.00	439.62	362.82	1.01
rho	29.10	13.76	11.87	49.56	0.56	0.45	813.94	579.28	1.00
eta	0.01	0.00	0.01	0.01	0.00	0.00	586.94	420.25	1.00

True \sigma = 0.05:

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
P	0.79	0.03	0.73	0.84	0.00	0.00	294.47	331.93	1.03
K	0.12	0.04	0.06	0.20	0.00	0.00	293.08	294.46	1.02
sigma	0.00	0.00	0.00	0.00	0.00	0.00	376.89	310.29	1.00
rho	147.81	57.71	58.03	242.88	2.99	2.12	481.07	450.83	1.00
eta	0.05	0.00	0.04	0.06	0.00	0.00	449.36	566.99	1.01

True \sigma = 0.10:

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
P	0.82	0.04	0.74	0.89	0.00	0.00	295.21	258.56	1.0
K	0.11	0.05	0.03	0.21	0.00	0.00	306.47	244.42	1.0
sigma	0.00	0.00	0.00	0.00	0.00	0.00	314.85	203.01	1.0
rho	342.86	176.65	133.03	632.52	9.29	6.64	503.39	370.84	1.0
eta	0.11	0.01	0.09	0.13	0.00	0.00	484.31	283.64	1.0

bwengals · June 17, 2023, 9:31pm

Hi @giorgp, want to let you know I’ve been messing with this a bit but don’t have an answer yet. Will let you know!

giorgp · June 18, 2023, 4:42pm

Hi @bwengals, great, thanks for letting me know and for your time! Looking forward to any updates you might have.

bwengals · June 21, 2023, 7:45pm

Finally found it, turns out it was small.

class BiasedModel(pm.gp.mean.Mean):
    def __init__(self, P, K):
        self.P = P
        self.K = K

    def __call__(self, X):
        return ((X * self.P) / (X + self.K) + (X*0.015)).flatten() ## flatten this out here

and then also flatten y here

    likelihood = gp.marginal_likelihood('y_obs', sigma=sigma, y=y_train.flatten(), X=ach_train)

It should work the same going from Latent to Marginal or any other GP implementation, but I think something has slightly changed with MvNormal to cause this.

giorgp · June 22, 2023, 12:08pm

Yes, that solved it! Thanks for your help!

Topic		Replies	Views
Implementing stan model in pymc3 Questions	4	859	June 22, 2021
Poor estimates from pymc examples on marginal and latent GP v5 gaussian_process , modeling	2	471	May 26, 2023
Adapting Hawkes process Stan code to PyMC version agnostic modeling , pytensor	14	668	July 3, 2023
Gaussian Process -Statistical rethinking v5 gaussian_process	2	836	January 28, 2023
Rounding Error Model version agnostic	4	34	January 20, 2025

Computer model calibration using a Gaussian Process: From stan to pymc

Related topics