I have fit a Gaussian process model that I have calibrated to some multivariate X and scalar y data which comes from an expensive (and gradient-less) function. Lets use an image from the manual as an example:
Once I have fit my Gaussian process to some data, I then compute the median values of the gaussian process parameters and use them in another pymc3 model to sample over by using gp.Latent:
cov_func = median_eta.reshape([1,1])**2.*hps['covfxn'](len(rvs),ls=median_ls)
gp = pm.gp.Latent(cov_func = cov_func)
gp_cond = gp.conditional('median_gp',Xnew=some_new_X,
given={'X':fit_gp.X,
'f':fit_gp.y,
'y':fit_gp.y,
'noise':fit_gp.noise},shape=1)
y_ = pm.Normal('y',mu=gp_cond,sd=sigma,observed=y)
This works reasonably well for approximating the function that generated the original X and y data but being faster to evaluate and I can use gradient samplers.
I would like to make sure though that when I am using the Latent Gaussian process that I stay within the domain that has relatively high confidence. Using the above figure as an example, I want to make sure that I wouldn’t end up sampling in the region above 10.
This is especially a problem if my observed y is close to the mean function value of my gp. The samplers then have a tendency to push my solution out into the unobserved domain.
Because my X is multivariate I can’t set bounds directly. I would like to weight my likelyhood towards regions of low variance in my gp. Does anyone have any ideas?