Multidimensional gaussian process

@Fresch
Unfortunately, I’m not able to share my code anymore (my employer has a restriction on what I can distribute) but I can give you some ideas and pseudo-code which I hope will help.

In my application, I don’t (currently) care about the uncertainty on my hyper parameters for my GP. I just need the best possible fit. This is because I’m constructing a surrogate model of the likelihood directly rather than trying to build a surrogate of my model response.

I’m adapting a modification of this paper (https://arxiv.org/abs/1703.09930) to do my sampling.

The process is as follows:
First, you construct a function that evaluates the log-likelihood of a given parameter set given the data. You can do this directly using pymc3 by doing:

y = theano.shared(some_model_evaluation)
sigma = 1.0 #Assuming sigma isn't a parameter though you could make it one
    
with pymc3.Model() as model:
    likelihood = pymc3.Normal('likelihood', mu=y, sd=sigma, observed=data)

logp = model.logp

#Now evaluate the logp for the given expensive model evaluation
some_model_evaluation_logp = logp()

#To evaluate another model run do
y.set_value(some_other_model_evaluation)
some_other_model_logp = logp()

From this you can build up two arrays: One of parameter sets, the other is a vector of corresponding log-likelihoods.

You can now fit a Gaussian process to the log-likelihood as a function of the parameters. I actually do this in GPy rather than pymc3 because, like I mentioned, uncertainty on my hyperparameters is not something I’m really worried about. You could also fit it in scikit-learn. Anywhere where you can either write out the GP kernel explicitly or just use the kernels already available in pymc3.

Now you have a function that maps parameters to the log likelihood so you can construct a pymc3 model via:

#Solve for alpha based on a sampling of parameters and log-likelihood evaluations (see algorithm 2.1 http://www.gaussianprocess.org/gpml/chapters/RW2.pdf)
alpha = ...

import theano.tensor as tt
with pm.Model() as model:
    #define your priors
    a_ = pm.Normal(...
    b_ = pm.Beta(...
    ...
    #Stack the variables
    params_ = tt.stack([a_,b_, ...],axis=0).reshape([1,nparams])

    #Evaluate the log-likelihood
    Ks = my_cov.K(parameter_samples, params_)
    likelihood = pm.Potential('likelihood', tt.dot(Ks.T, alpha))

    #Perform your sampling
    trace = pm.sample(...)

Most of the pitfalls with this approach (that I’ve found) are centered around the surrogate:

  1. I’ve found you need to have samples all around your optimum to get convergence to good values. Local minimums are everywhere.
  2. Fitting the Gaussian process can be non-trivial depending on your log-likelihood. You should consider standardizing your data (subtract the mean and divide by your standard deviation BEFORE you fit your GP. Then when after you evaluate your GP you multiply the response by the standard deviation and add back the mean).
  3. Outliers in the log-likelihood can really kill your GP fit.

Let me know if any of that is unclear or if you need any further help.

Good luck!

2 Likes