Computer model calibration using a Gaussian Process: From stan to pymc

I am in the process of translating models I have written in stan into pymc. I am not sure what the most efficient way to implement the following model is:

functions {
  vector biased_model(vector ach_v, real P, real K, int N){
    vector[N] model_out;
    model_out = ((ach_v * P) ./ (ach_v + K)) + (ach_v * 0.015);
    return model_out;
  }
}


// The input data is a vector 'y' of length 'N'.
data {
  int<lower=0> N; // Number of observations
  vector[N] y; // Observations of I/O ratio
  real ach[N]; // ACH measurements
}

transformed data{
  vector[N] ach_v = to_vector(ach);
}

parameters {
  real<lower=0> sigma;
  real<lower=0> P;
  real<lower=0> K;
  real<lower=0> rho;
  real<lower=0> alpha;
}

// The model to be estimated. We model the output
// 'y' to be normally distributed.
// The mean of this distribution is the result of
// applying the steady state equation on the measured ACH values
// given the *sampled* parameters P and K. 
// The standard deviation is the noise captuered by sigma.
model {
  matrix[N, N] L_K;
  matrix[N, N] K_cov = gp_exp_quad_cov(ach, alpha, rho);
  real sq_sigma = square(sigma);

  // diagonal elements
  for (n in 1:N) {
    K_cov[n, n] = K_cov[n, n] + sq_sigma;
  }

  L_K = cholesky_decompose(K_cov);
  
  
  // priors
  rho ~ inv_gamma(5, 5);
  alpha ~ normal(0,2);
  P ~ normal(0.85, 0.2);
  K ~ normal(0.12, 0.05);
  sigma ~ exponential(10);
  
  y ~ multi_normal_cholesky(biased_model(ach_v, P, K, N), L_K);
}

What I implemented in the above is an example of Bayesian computer model calibration, as described in Higdon et al. (2004), where the computer model is assumed to be fast and biased. The purpose of the GP model is to capture the discrepancy (\delta) that exists between the true noisy observations (y) and computer model (\eta):

y = \eta(x, \theta) + \delta(x) + \epsilon

where x represents the known explanatory variables, \theta represents the uknown calibration parameters, and \epsilon is noise assumed to idd.

I did some reading on the pymc case studies and on the forum, and thought that a marginal likelihood may be the fastest, although, I’m not sure how to best implement it. I had a go as shown below, but please let me know if this is sensible.

import numpy as np
import pandas as pd
import pymc as pm

# Assume that the "true" but unknown values of P and K are
P_true = 0.8 #penetration efficiency 
K_true = 0.10 #deposition rate (h-1)

# Create data
data = pd.DataFrame({"measured_ach": np.random.uniform(0.75, 1.25, 250)})
data["measured_i_o"] = data["measured_ach"].apply(lambda ach: model(ach, P=P_true, K=K_true))
sigma_par = 0.01
data["measured_i_o_noisy"] = data["measured_i_o"] + np.random.normal(loc=0, scale=sigma_par, size=len(data))

# Subset data for training
N = 49
y_train = data.loc[0:N,'measured_i_o_noisy'].to_numpy()[:, None]
ach_train = data.loc[0:N,"measured_ach"].to_numpy()[:, None]

class BiasedModel(pm.gp.mean.Mean):
    def __init__(self, P, K, ach):
        self.P = P
        self.K = K
        self.ach = ach

    def __call__(self, ach):
        return ((ach * P) / (ach + K) + (ach*0.015))

with pm.Model() as model_bias_marginal1:
    # Assume exponential prior with rate (or inverse scale) of 10
    sigma = pm.Exponential('sigma', lam=10)
    # Assume normal prior with mean of 0.85, standard deviation of 0.2
    P = pm.TruncatedNormal('P', mu=0.85, sigma=0.2, lower=0)
    # Assume normal prior with mean of 0.12, standard deviation of 0.05
    K = pm.TruncatedNormal('K', mu=0.12, sigma=0.05, lower=0)
    
    # Note, in pymc doc rho is equivalent to l
    rho = pm.InverseGamma("rho", alpha=5, beta=5)
    eta = pm.HalfNormal("eta", sigma=2)

    # Covariance function for the model bias GP
    cov = eta**2 * pm.gp.cov.ExpQuad(1, rho)
    
    # Specify biased model output as mean to the GP marginal
    gp = pm.gp.Marginal(mean_func= BiasedModel(P, K, ach_train), cov_func=cov)

    likelihood = gp.marginal_likelihood('y_obs', sigma=sigma, y=y_train, X=ach_train)

with model_bias_marginal1:
    # draw 500 posterior samples
    idata_model_bias_marginal1 = pm.sample(tune=500, draws=500, cores=2, chains=2, random_seed=456)

Another approach, which is likely slower, but seems more similar to my original model is:

def BiasedModel(ach, P, K):
    return ((ach * P) / (ach + K)) + (ach*0.015)

# MvNormal likelihood implementation
with pm.Model() as model_bias_mvn:
    
    ###### Parameter Priors ##########
    # Assume exponential prior with rate (or inverse scale) of 10
    sigma = pm.Exponential('sigma', lam=10)
    # Assume normal prior with mean of 0.85, standard deviation of 0.2
    P = pm.TruncatedNormal('P', mu=0.85, sigma=0.2, lower=0)
    # Assume normal prior with mean of 0.12, standard deviation of 0.05
    K = pm.TruncatedNormal('K', mu=0.12, sigma=0.05, lower=0)
    
    ###### Hyperparameter Priors ######
    # Note, in pymc doc rho is equivalent to l
    rho = pm.InverseGamma("rho", alpha=5, beta=5)
    eta = pm.HalfNormal("eta", sigma=2)

    ###### Model prediction #####
    mu = pm.Deterministic("mu", BiasedModel(ach_train, P, K))
    
    # Define covariance function for the model bias and noise
    cov_delta = eta**2 * pm.gp.cov.ExpQuad(1, rho) 
    cov_epsilon = pm.gp.cov.WhiteNoise(sigma)
    
    # Construct single covariance matrix
    Sigma_comb = cov_delta(X=ach) + cov_epsilon(X=ach)

    y_obs = pm.MvNormal('y_obs', mu=mu, cov=Sigma_comb, observed=y_train)

with model_bias_mvn:
    # draw 250 posterior samples
    idata_model_bias_mvn = pm.sample(tune=500, draws=250, cores=2, chains=2, random_seed=456)

My questions:

  • Any advice on how to best define this model in pymc? Is there an alternative formulation that you would recommend?
  • Am I missing something in either of my attempts that could speed things up, or do either of them seem wrong to you?
  • In particular, with regard to my second attempt, can I take advantage of cholesky decomposition (or is that already implemented within the MvNormal function)?

I appreciate that I am asking a lot of questions, and providing me with an answer to any of them would be much appreciated!

For 1, It looks like your ach measurements are one dimensional (like a time series). I think you could get a massive speedup then using the HSGP approximation. Unfortunately there isn’t an example of it yet in the docs, but check the docstrings there for usage. It’s basically a drop in replacement for gp.Latent (when your inputs are low dimensional, 1 or 2, maybe 3, and your covariance is one of ExpQuad, Matern52, Matern32, currently).

For 2 and 3, no, nothing jumps out to me as being slow. The way you implemented it directly using MvNormal is essentially what pm.gp.Marginal does under the hood, though the latter takes care of a few more minor implementation details (adding jitter, doing predictions). I would assume they run about the same speed, and MvNormal is using the Cholesky factor to calculate the log probability.

Also keep in mind that using better and/or more informative priors can also dramatically speed up sampling.

Thanks for your reply and your suggestions. I will look into HSGP in the next few days.

Over the last couple of days, I have been working on this with the following basic formulation:

# Defining a custom mean to represent the biased model predictions
class BiasedModel(pm.gp.mean.Mean):
    def __init__(self, P, K):
        self.P = P
        self.K = K

    def __call__(self, X):
        return (X * self.P) / (X + self.K) + (X*0.015)

N = 49

y_train = data.loc[0:N,'measured_i_o_noisy'].to_numpy()[:, None]
ach_train = data.loc[0:N,"measured_ach"].to_numpy()[:, None]


with pm.Model() as model_bias_marginal1:
    # Assume exponential prior with rate (or inverse scale) of 10
    sigma = pm.Exponential('sigma', lam=10)
    # Assume normal prior with mean of 0.85, standard deviation of 0.2
    P = pm.TruncatedNormal('P', mu=0.85, sigma=0.2, lower=0)
    # Assume normal prior with mean of 0.12, standard deviation of 0.05
    K = pm.TruncatedNormal('K', mu=0.12, sigma=0.05, lower=0)
    
    # Note, in pymc doc rho is equivalent to l
    rho = pm.InverseGamma("rho", alpha=5, beta=5)
    eta = pm.HalfNormal("eta", sigma=1)

    # Covariance function for the model bias GP
    cov = eta**2 * pm.gp.cov.ExpQuad(1, rho)
    
    gp = pm.gp.Marginal(mean_func= BiasedModel(P=P, K=K), cov_func=cov)

    likelihood = gp.marginal_likelihood('y_obs', sigma=sigma, y=y_train, X=ach_train)

with model_bias_marginal1:
    # draw 1000 posterior samples
    idata_model_bias_marginal1 = pm.sample(tune=1000, draws=500, cores=2, chains=2, random_seed=456)

### Convergence ####
# Convergence check 1: Visually inspecting traceplots
az.plot_trace(idata_model_bias_marginal1, var_names = ["P", "K", "sigma"], combined=True);
# Convergence check 2: Rhat and n_eff
az.summary(idata_model_bias_marginal1, var_names = ["P", "K", "sigma", "rho", "eta"], round_to=2)

With the true values being,

P = 0.8 
K = 0.10 
sigma = 0.01

After having tried changing the priors, increasing the number of observations and standardising the BiasedModel output and observations, I always get sigma to be zero instead of 0.01. I even tried using the WhiteNoise cov although I understood it to be the same as specifying sigma in gp.marginal_likelihood.

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
P 0.782 0.012 0.759 0.805 0.001 0.000 324.510 367.917 1.011
K 0.096 0.017 0.065 0.129 0.001 0.001 300.704 379.986 1.011
sigma 0.000 0.000 0.000 0.000 0.000 0.000 294.313 291.642 1.008
rho 41.408 18.022 17.080 71.489 0.997 0.845 602.177 481.357 1.002
eta 0.011 0.001 0.009 0.013 0.000 0.000 575.593 367.292 1.009

P and K are not that bad, but sigma is always 0. Running the equivalent model in stan resulted in the following estimates:

       mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
P     0.80       0 0.11 0.59 0.75 0.78 0.84  1.04  1506 1.00
K     0.10       0 0.04 0.03 0.07 0.10 0.13  0.20   836 1.01
sigma 0.01       0 0.00 0.01 0.01 0.01 0.01  0.01   206 1.02

So far, I assumed that this was due to a mistake in my pymc formulation. Could it be something else? I had randomly generated the data in R (for stan) and Python for these tests, but while using the same boundary from a uniform distribution. Could this difference be due to random variability alone?

Nothings jumping out to me in your PyMC code that would cause sigma to be underestimated relative to the Stan version, they look the same to me but I might be missing something. I tried running your example but I get error on this line: data["measured_i_o"] = data["measured_ach"].apply(lambda ach: model(ach, P=P_true, K=K_true)). Would it be possible to try with a larger sigma? Might make it easier to compare.

Apologies for providing an incomplete example, I somehow left the definition of the model out. Please see below for a complete example:

import numpy as np
import pandas as pd
import pymc as pm
import arviz as az
import matplotlib.pyplot as plt

def model(ach, P, K):
    i_o = ((ach*P)/(ach+K))
    return i_o

# Assume that the "true" but unknown values of P and K are
P_true = 0.8 #penetration efficiency 
K_true = 0.10 #deposition rate (h-1)

# Create data
data = pd.DataFrame({"measured_ach": np.random.uniform(0.75, 1.25, 250)})
data["measured_i_o"] = data["measured_ach"].apply(lambda ach: model(ach, P=P_true, K=K_true))
sigma_par = 0.01
data["measured_i_o_noisy"] = data["measured_i_o"] + np.random.normal(loc=0, scale=sigma_par, size=len(data))

# Plot the data
# Add bias to the measured_i_o_noisy column
data['measured_i_o_noisy_bias'] = data['measured_i_o'] + data['measured_ach'] * 0.015

# Plot the data
plt.figure(figsize=(10, 6))
plt.scatter(x=data['measured_ach'], y=data['measured_i_o'], label='True', color="black")
plt.scatter(x=data['measured_ach'], y=data['measured_i_o_noisy'], label='Noisy', color="red")
plt.scatter(x=data['measured_ach'], y=data['measured_i_o_noisy_bias'], label='True + Bias', color="blue")
plt.xlabel('Measured ACH [1/h]')
plt.ylabel('Measured I/O Ratio')
plt.legend(loc='upper left')
plt.show()


# Defining a custom mean to represent the biased model predictions
class BiasedModel(pm.gp.mean.Mean):
    def __init__(self, P, K):
        self.P = P
        self.K = K

    def __call__(self, X):
        return (X * self.P) / (X + self.K) + (X*0.015)

N = 49

y_train = data.loc[0:N,'measured_i_o_noisy'].to_numpy()[:, None]
ach_train = data.loc[0:N,"measured_ach"].to_numpy()[:, None]


with pm.Model() as model_bias_marginal1:
    # Assume exponential prior with rate (or inverse scale) of 10
    sigma = pm.Exponential('sigma', lam=10)
    # Assume normal prior with mean of 0.85, standard deviation of 0.2
    P = pm.TruncatedNormal('P', mu=0.85, sigma=0.2, lower=0)
    # Assume normal prior with mean of 0.12, standard deviation of 0.05
    K = pm.TruncatedNormal('K', mu=0.12, sigma=0.05, lower=0)
    
    # Note, in pymc doc rho is equivalent to l
    rho = pm.InverseGamma("rho", alpha=5, beta=5)
    eta = pm.HalfNormal("eta", sigma=1)

    # Covariance function for the model bias GP
    cov = eta**2 * pm.gp.cov.ExpQuad(1, rho)
    
    gp = pm.gp.Marginal(mean_func= BiasedModel(P=P, K=K), cov_func=cov)

    likelihood = gp.marginal_likelihood('y_obs', sigma=sigma, y=y_train, X=ach_train)

with model_bias_marginal1:
    # draw 1000 posterior samples
    idata_model_bias_marginal1 = pm.sample(tune=1000, draws=500, cores=2, chains=2, random_seed=456)

### Convergence ####
# Convergence check 1: Visually inspecting traceplots
az.plot_trace(idata_model_bias_marginal1, var_names = ["P", "K", "sigma"], combined=True);
# Convergence check 2: Rhat and n_eff
az.summary(idata_model_bias_marginal1, var_names = ["P", "K", "sigma", "rho", "eta"], round_to=2)

I tried the suggestion of varying \sigma, and what I found is interesting, at least to me.

True \sigma = 0.01:

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
P 0.77 0.01 0.75 0.79 0.00 0.00 445.13 390.49 1.01
K 0.08 0.01 0.05 0.10 0.00 0.00 439.76 403.88 1.01
sigma 0.00 0.00 0.00 0.00 0.00 0.00 458.21 333.47 1.00
rho 33.39 16.53 13.38 58.06 0.76 0.56 586.18 495.53 1.00
eta 0.01 0.00 0.01 0.01 0.00 0.00 508.92 482.28 1.00

True \sigma = 0.05:

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
P 0.79 0.03 0.72 0.84 0.00 0.00 219.98 148.02 1.01
K 0.10 0.04 0.02 0.18 0.00 0.00 218.65 239.43 1.01
sigma 0.00 0.00 0.00 0.00 0.00 0.00 325.92 163.95 1.00
rho 207.92 90.94 89.19 353.05 4.58 3.24 341.75 351.22 1.01
eta 0.06 0.01 0.04 0.07 0.00 0.00 407.19 427.03 1.00

True \sigma = 0.10:

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
P 0.79 0.04 0.72 0.86 0.00 0.0 294.56 391.29 1.0
K 0.12 0.05 0.03 0.22 0.00 0.0 252.07 225.04 1.0
sigma 0.00 0.00 0.00 0.00 0.00 0.0 698.40 403.64 1.0
rho 389.40 166.82 143.72 699.65 7.12 5.4 670.56 466.75 1.0
eta 0.11 0.01 0.09 0.12 0.00 0.0 748.37 658.60 1.0

In summary, from the results:

  • By increasing the true value of \sigma, \eta and \rho (equiv to l) increase.
  • The mean of the posterior of \eta seems to relate to the true value of \sigma
  • The posterior of \sigma remains zero.

That looks very odd. What happens when you play around with the prior for sigma? E.g. using a half-normal.

Not a lot changes. Assuming \sigma \sim HalfNormal(0.2):

True \sigma = 0.01:

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
P 0.78 0.01 0.76 0.81 0.00 0.00 441.82 368.99 1.01
K 0.09 0.02 0.06 0.13 0.00 0.00 449.79 424.92 1.01
sigma 0.00 0.00 0.00 0.00 0.00 0.00 439.62 362.82 1.01
rho 29.10 13.76 11.87 49.56 0.56 0.45 813.94 579.28 1.00
eta 0.01 0.00 0.01 0.01 0.00 0.00 586.94 420.25 1.00

True \sigma = 0.05:

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
P 0.79 0.03 0.73 0.84 0.00 0.00 294.47 331.93 1.03
K 0.12 0.04 0.06 0.20 0.00 0.00 293.08 294.46 1.02
sigma 0.00 0.00 0.00 0.00 0.00 0.00 376.89 310.29 1.00
rho 147.81 57.71 58.03 242.88 2.99 2.12 481.07 450.83 1.00
eta 0.05 0.00 0.04 0.06 0.00 0.00 449.36 566.99 1.01

True \sigma = 0.10:

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
P 0.82 0.04 0.74 0.89 0.00 0.00 295.21 258.56 1.0
K 0.11 0.05 0.03 0.21 0.00 0.00 306.47 244.42 1.0
sigma 0.00 0.00 0.00 0.00 0.00 0.00 314.85 203.01 1.0
rho 342.86 176.65 133.03 632.52 9.29 6.64 503.39 370.84 1.0
eta 0.11 0.01 0.09 0.13 0.00 0.00 484.31 283.64 1.0

Hi @giorgp, want to let you know I’ve been messing with this a bit but don’t have an answer yet. Will let you know!

1 Like

Hi @bwengals, great, thanks for letting me know and for your time! Looking forward to any updates you might have.

Finally found it, turns out it was small.

class BiasedModel(pm.gp.mean.Mean):
    def __init__(self, P, K):
        self.P = P
        self.K = K

    def __call__(self, X):
        return ((X * self.P) / (X + self.K) + (X*0.015)).flatten() ## flatten this out here

and then also flatten y here

    likelihood = gp.marginal_likelihood('y_obs', sigma=sigma, y=y_train.flatten(), X=ach_train)

It should work the same going from Latent to Marginal or any other GP implementation, but I think something has slightly changed with MvNormal to cause this.

2 Likes

Yes, that solved it! Thanks for your help!