Gaussian Process Regression with hyperpriors

My data:

I have a matrix of N predictors, with t observations for the model training:
X (t x N)
I have a target vector (the thing I would like to predict):
y (t,)
I have a vector of test cases for each predictor, in order to perform the forecast:
X_new (1 x N)

I would like to perform a multivariate Gaussian Process regression with hyperpriors p(θ), whereby optimum hyperparameters (θ = ℓ, a, σ) (see code below) are first determined by Metropolis MCMC and then the optimum regression weight parameters β are determined (y = βX + ε) in order to perform a forecast y_new = βX_new. However I’m not quite sure if I am implementing this correctly in pyMC3…

So my understanding is that parameters μ and Σ (see code below) correspond to the mean and covariance of the model prior p(β). I’m wondering, does my code below just take one random sample from the defined distributions of each of the hyperparameters and then perform MCMC sampling on the model posterior p(β | y, X, θ) to get the regression weights? Or does it perform MCMC sampling on the posterior of the hyperparameters p(θ | y, X) and the model posterior p(β | y, X, θ)? If it’s the former, how can I change my code to implement hyperpriors and optimise them with Metropolis MCMC, and finally perform a prediction of y?

Side note - the prior on my hyperparameters would be weakly informative. All I know is that they should all be > 0.


with pm.Model() as model:
    ℓ = pm.Gamma('ℓ', alpha=1, beta=10, shape=X.shape[1])
    a = pm.HalfCauchy('a', beta=10)
    σ = pm.HalfCauchy("σ", beta=10)

    μ =
    Σ = a *,active_dims=np.arange(X.shape[1]),ls=ℓ)

    gp =μ, cov_func=Σ)
    y_ = gp.marginal_likelihood('y_',X=X, y=y, noise=σ)

    MAP = pm.find_MAP()
    method = pm.Metropolis()
    trace = pm.sample(1000, method, MAP, random_seed=1, progressbar=True)
    func = gp.conditional('func', Xnew=X_new)
    fmean, fvar = gp.predict(Xnew=X_new, point=MAP)

Think I figured it out in the end with a bit of playing around:

 with pm.Model() as model:
    # hyperpriors
    ℓ_mu = pm.Normal('ℓ_mu', mu=-10, sd=2)
    ℓ_sd = pm.Gamma('ℓ_sd', alpha=10, beta=10)
    a_mu = pm.Normal('a_mu', mu=-1, sd=2)
    a_sd = pm.Gamma('a_sd', alpha=5, beta=10)
    σ_mu = pm.Normal('σ_mu', mu=-1, sd=2)
    σ_sd = pm.Gamma('σ_sd', alpha=5, beta=10)

    # priors
    logℓ = pm.Normal('log(ℓ)', mu=ℓ_mu, sd=ℓ_sd) 
    loga = pm.Normal('log(a)', mu=a_mu, sd=a_sd)
    logσ = pm.Normal('log(σ)', mu=σ_mu, sd=σ_sd)
    ℓ = pm.Deterministic('ℓ', tt.exp(logℓ)) #kernel length-scale
    a = pm.Deterministic('a', tt.exp(loga)) #signal variance
    σ = pm.Deterministic('σ', tt.exp(logσ)) #noise variance
    μ =
    Σ = a *[1],active_dims=np.arange(X.shape[1]),ls=ℓ)

    gp =μ, cov_func=Σ)
    y_ = gp.marginal_likelihood('y_',X=X, y=y, noise=σ)
    trace = pm.sample(draws=1000, chains=5, step=pm.Metropolis(), random_seed=1, progressbar=False)
    func = gp.conditional('func', Xnew=X_new, pred_noise=True)
    fmean, fvar = gp.predict(X_new, point=trace[-1], pred_noise=True)
1 Like

Could please add fake data? I want to run your code.

You could use a sklearn dataset maybe, e.g Boston house prices.

The data I have been using is zero-mean and typically N>n. I’ve made the Boston dataset here zero-mean also, however for the Boston data n>>N so I’m not to sure how the GP will perform. You may also need to adjust the hyper-priors (particularly the characteristic length scale parameter) as the parameters from my last comment are probably a bit dataset specific

from sklearn.datasets import load_boston
boston = load_boston()
X = np.zeros((
#create zero-mean predictors
for N in range([1]):
    p = np.polyfit(np.arange([0]),[:,N],1)
    line = np.polyval(p, np.arange([0]))
    X[:,N] =[:,N] - line
X_new = X[-1,:].reshape(1,X.shape[1]) #test case
X = X[:-2,:] #predictors
#create zero-mean target
p = np.polyfit(np.arange([0]),,1)
line = np.polyval(p, np.arange([0]))
y = - line
truth = y[-1]
y = y[:-2] #target