Bayesian model calibration with Gaussian Process

I am attempting to implement bayesian model calibration under the classical Kennedy-O’Hagan framework using PyMC3. Briefly, the setup is as follows:

I have n observations z_i of the form

z_i = f(x_i, \theta) + g(x_i) + e_i,

where x_i are known imputs and \theta are unknown calibration parameters and e_i are iid error terms. We also have m model evaluations y_j of the form

y_j = f(x^*_j, \theta_j),

where both x^*_j (different than x_i above) and \theta_j are known. Therefore, the data consists of all z_i and y_j. In the paper, Kennedy-O’Hagan model f, g using gaussian processes:

f \sim GP\{m_1(.,.), \Sigma_1[(.,.),(.,.)] \}

g \sim GP\{m_2(.), \Sigma_2[(.),(.)] \}.

Among other things, the goal is to get posterior samples for the unknow calibration iparameters \theta. As I said, I am trying to implement the solution using Python and PyMC3 in particular.

I have the following challenges:

  • How to define the model in PyMC3 in the first place? I know how to do it ,when I have one GP for the whole data set or sum of GP for the whole data set. In this case, part of the data follows one model, another part follows another model.

  • \theta is an input into the covariance function for f. There does not seem any streight forward way to do this in PyMC3 as well.

I will be very grateful for any tips or ideas how to solve the above mentioned problems!

If I understand correctly, you have x_i as input and two different GP evaluate on part of x_i which gives z_i and y_j right? You can partition x_i into two part and model two GP within a pm.Model() block. Moreover, you can specify \theta one and input them into each GP if they are the same for both GP.

Not sure I understand why you said \theta is not straight forward to model - are they regular covariance function parameters like lenght scale etc?

1 Like

I have a different x_j 's that give y_j. Sorry for the mistake, I corrected it in the innitial question. I have still hard time to see how to exactly define it within the pm.Model() block. The gaussian processes also depend on some parameters \gamma which are in the case of f shared across z_j and y_j. I guess my question is, if I do something like this:

'gp1 = pm.gp.Marginal(mean_func1, cov_func1)
gp2 = pm.gp.Marginal(mean_func2, cov_func2)
gp3 = gp1 + gp2

f_1 = gp1.marginal_likelihood(“f_1”, X_1, y_1, noise)
f_2 = gp3.marginal_likelihood(“f_2”, X_2, y_2, noise)

will the inference be done properly?

In the case of \theta, it is a vector parameter that is not the parameter of the covariance functions but it plays a role of x in the covariance matrix. i.e. I have a input (\theta, x_i) for each z_i (same \theta for each)

I am far from an expert on Pymc3 but does it make sense to stack your x_i and \theta values into one large X vector to pass into your GP? That way all of your x_i and \theta values are inputs.

Something like

import theano.tensor as tt

n = number_of_datapoints
theta1_vals = np.random.normal(loc=theta1_mu,scale=theta1_sd,size=n)
theta2_vals = np.random.normal(loc=theta2_mu,scale=theta2_sd,size=n)
.
.
.

thetas = zip(theta1_vals,theta2_vals,...)

y1 = [some_function(x,theta1,theta2) for theta1,theta2,... in thetas]

X = np.vstack([np.hstack([x,theta]) for theta in thetas])

with pm.Model() as model:
    .
    .
    .

    f1 = gp1.marginal_likelyhood('y',X = X, y=y1,noise=sigma_1)
    .
    .
    .

Another thought, if you never need to vary x_i after you have created your training set maybe your gp only needs to be a function of \theta even though x_i was used in its creation. Could you just do:

import theano.tensor as tt

n = number_of_datapoints
theta1_vals = np.random.normal(loc=theta1_mu,scale=theta1_sd,size=n)
theta2_vals = np.random.normal(loc=theta2_mu,scale=theta2_sd,size=n)
.
.
.

thetas = zip(theta1_vals,theta2_vals,...)

y1 = [some_function(x,theta1,theta2) for theta1,theta2,... in thetas]

X = np.vstack(thetas)

with pm.Model() as model:
    .
    .
    .

    f1 = gp1.marginal_likelyhood('y',X = X, y=y1,noise=sigma_1)
    .
    .
    .

Your gaussian process model would be len(y) long so you never need to use x_i when you call it. The first approach would let you make x_i uncertain though which I could see being helpful sometimes.

1 Like

You can run some simulation to see if it can recover the parameters. But it sounds like a valid use case to me.

Hmm, I am not sure how this solves the problem of need for something like:

with pm.Model() as model:
theta = pm.MvNormal("theta", mu = mean, cov = covariance)

cov_func = pm.gp.cov.ExpQuad(len(mu) + X.shape[1], lengthscale)

gp = pm.gp.Marginal(cov_func = cov_func)

#Now I need to perfrom something that stacks all (theta,x). Lets call this X1
f1 = gp.marginal_likelihood('y', X = X1, y = y, noise = sigma)

I do need to perform inference about unknown RV \theta and get its posterior distribution.

It looks like this works :slight_smile: Thanks! However, I am still strugling with theano RV \theta as an input (not a parameter) into a cov function of a GP.

The cov function in PyMC3 accepts numpy array, theano tensor, or pymc3 RV as input. You can only do inference on the RVs in general (with some twist you can do it on theano tensor as well)

Maybe I’m misunderstanding (I haven’t read the Kennedy paper) but your \theta values are your parameter values in your more complicated function correct? If you never vary x_i but treat it as known, why do you need it as an input to your gaussian process? Isn’t that information already included in your y vector? i.e.

y = np.array([some_function(x,thetas) for thetas in theta_vals]) #(A n x len(x) matrix)

So now your gp is really only a function of your \theta values which would then be your X?

Please disregard if I’m still not getting it.

I am sorry, I reallized that I might not have been the most clear. I will try to elaborate. I am trying to model something like this

z_i = f(x_i, \theta) + e_i
y_j = f(x^*_j, \theta^*_j) +e_j

where

f \sim GP\{m(.,.), \Sigma[(.,.)(.,.)]\}

with

m(x,\theta) = x^T \theta

and

\Sigma[(x,\theta)(x',\theta')] = e^{-\sum (\frac{\theta_i - \theta'_i}{\gamma_i})^2 -\sum (\frac{x_j - x'_j}{\eta_i})^2 }.

And my observed responses are D = (z_1, \dots, z_n, y_1, \dots, y_m) with covariates \{(x_1, \theta), \dots, (x_n,\theta), (x^*_1, \theta^*_1), \dots, (x^*_m, \theta^*_m)\}. Where I have x_i, x^*_j, \theta^*_j as known fixed values (it is part of my data) and unknow calibration parameters \theta.

My goal is to do inference abut all \gamma_i, \eta_j, \theta and get their posterior distributions. So that is why I need to use unknown \theta as an input into a covariance function, and that is what I am struggling with.

Oh okay. I’m afraid I’m well out of my depth there then. I will be watching this thread with interest to see if you arrive at a solution.

I found a link where someone was working on your same problem with pymc3 and sklearn:
https://github.com/pymc-devs/pymc3/issues/1574

1 Like

@NateAM thanks! This seems to be helpful. It does look rather like a work-around though. I am still wondering if there is more straightforward solution with PyMC3. Any suggestions @junpenglao?

That post is a bit dated - we now implemented a full feature GP class which makes these problem much easier to model.
I think it would help if you could write down the generation process and simulation some data. There are many ways to model what you describe above, and if there are difficulties using the GP API directly, there are usually ways to code the model differently

Here is a little toy simulation of what I am trying to code:

import numpy as np
import pymc3 as pm

np.random.seed(123)
def black_box(input):
    '''Represents the expensive model f which I am trying to calibrate'''
    l_scale = np.array([1, 1])
    beta = 1
    
    cov_fcn = pm.gp.cov.ExpQuad(2, l_scale)
    cov_mat = cov_fcn(input).eval()
    mean = np.repeat(beta, len(input)) #For simplicity, the mean is constant
    
    return np.random.multivariate_normal(mean, cov_mat).flatten()

x1 = np.random.randn(10)
x2 = np.random.randn(15)
theta_star = np.random.randn(15)
theta_truth = 2

input_1 = np.array([[x_1, theta_truth] for x_1 in x1])
input_2 = np.array([[x2[i], theta_star[i]] for i in range(len(x2))])

#y1 represents observations for known imputs x1 and unknown calibration parameters
y1 = black_box(input_1)
#y2 represents model runs for known x2 and theta_star
y2 = black_box(input_2)

with pm.Model() as toy_model:
    
    #Priors
    theta = pm.Normal("theta", mu = 0, sd = 1)
    beta = pm.Normal("beta", mu = 0, sd = 1)
    l_scale = pm.Gamma("l_scale", alpha= 1, beta = 5, shape = 2)
    σ = pm.HalfCauchy("σ", beta=5)
    
    #Mean and cov of the GP
    mean_fcn = pm.gp.mean.Constant(beta)
    cov_fcn = pm.gp.cov.ExpQuad(2, l_scale)
    
    #This chunk is where it fails###################
    input_1 = np.array([[x_i, theta] for x_i in x1])
    ################################################
    
    gp1 = pm.gp.Marginal(mean_func = mean_fcn, cov_func = cov_fcn)
    gp2 = pm.gp.Marginal(mean_func = mean_fcn, cov_func = cov_fcn)
    
    y_1 = gp2.marginal_likelihood("y1", X=input_1, y=y1, noise=σ)
    y_2 = gp2.marginal_likelihood("y2", X=input_2, y=y2, noise=σ)

This gives me the following error:
AsTensorError: ('Cannot convert [[-1.0856306033005612 theta]\n [0.9973454465835858 theta]\n [0.28297849805199204 theta]\n [-1.506294713918092 theta]\n [-0.5786002519685364 theta]\n [1.651436537097151 theta]\n [-2.426679243393074 theta]\n [-0.42891262885617726 theta]\n [1.265936258705534 theta]\n [-0.8667404022651017 theta]] to TensorType', <class 'numpy.ndarray'>)

Any Idea how to make PyMC3 accept something like:

theta = pm.Normal("theta", mu = 0, sd = 1)
input_1 = np.array([[x_i, theta] for x_i in x1])

This is a use case where PyMC3 should really work well! The GP functions can definitely handle this. You can find a somewhat similar example here, where a flat prior is placed over the inducing point locations, which are input to a covariances function.

Is there a different theta for each x_i? or is it the same theta? (is it a scalar or a column vector?) If it’s a scalar, maybe try something like (you’ll have to double check these shapes)

theta = pm.Normal("theta", mu=0.0, sd=1.0)
theta_col = tt.alloc(theta, len(x1))
input_1 = tt.concatenate((x1[:,None], theta_col[:,None], axis=1)

Also, if by chance what you’re working on is open source / online id love to see a link!

2 Likes

Yep, the key is to stack the theano scaler with a numpy array. For example another way to do is

import theano
import theano.tensor as tt
x1shared = theano.shared(x1[:, None])
with pm.Model():
    ...
    input_1 = tt.concatenate([x1shared, tt.tile(theta, len(x1), ndim=2).T], axis=1)
    ...
2 Likes

@bwengals and @junpenglao thank you! This actually works well. @bwengals the theta is same for each x_i so what @junpenglao wrote works perfect! In general \theta \in \mathbb{R}^d, but I have already figured out how to modify the code so it works for a vector as well.

The mean function of my GP f is also a little bit peculiar. It looks like this:
m(x, \theta) = c(x) + \theta^Tv(x),
where c(x) is a constant and v(x) is a vector in \mathbb{R}^d.

I am implementing this by adding column to input_2, since I can calculate it explicitly for [x2, theta_star] and then having a linear mean function for gp2 with coeff = [1, 0, ...,0] and intercept = 0. I will also let the covariance function act only on the appropriate dimensions.

In the case of gp1, I am planning to include v(x)^T as rows in the design matrix x1 and then have a linear mean function with coeff = tt.concatenate([theta, [0,...,0]]) again with covariance matrix acting on particular dimensions.

2 Likes