How to Use PyMC Distributions as Inputs to GP.conditional

Hi! I’m new to implementing Gaussian Processes in PyMC, and I have a follow-up to a question asked earlier this year on implementing a GP as an emulator for simulated data.

Instead of creating an emulator, @lilianschuster and I would like to use a GP, fitted to simulated data, as a latent model for modeling observed inputs when we only have one response observation.

To do this, I write up a latent GP model that fits the simulated inputs to the simulated response, following the example in the gallery:

with pm.Model() as model:
    
    #define the covariance parameters and function
    input_dim = 2 #number of fit variables
    cov = pm.gp.cov.ExpQuad(input_dim, ls=0.1)
    
    #instantiate latent GP
    gp = pm.gp.Latent(cov_func=cov)

    # np.shape(X_sim) : (663, 2)
    f = gp.prior('latent_gp', X=X_train, 
                       jitter=1e-7)
    gp_sigma = pm.HalfNormal('gp_sigma', sigma=2.0)
    nu = 1 + pm.Gamma('nu', alpha=2, beta=0.1)

    # np.shape(X_sim) : (663, 2)
    y_ = pm.StudentT('y', mu=f, lam=1.0 / gp_sigma, nu=nu, 
                                 observed=y_train)

    latent_trace = pm.sample(1000, chains=2, target_accept=0.9)

The above code runs smoothly, but I get stuck on the rest of the implementation. After this I try to run gp.conditional on two observationally informed distributions for each input dimension.

with model:

    #define the "observational" distributions for inputs to gp.conditional     
    dim1 = pm.TruncatedNormal('dim1_fit', mu=0, sigma=2,
                            lower=-8, upper=8)
    dim2 = pm.TruncatedNormal('dim2_fit',
                                mu=250,
                                sigma=200,
                                lower=1, upper=1000)

    Xnew = np.array([dim1, dim2])
    
    #instantiate the conditional fit of GP to new "data"
    gp_mu = gp.conditional('conditional_fit', Xnew=Xnew)

Trying to fit gp.conditional to Xnew composed of two distribution tensors throws errors, and I’m not sure how to fix this. Even if I try to reshape or add and index of [:,None] to the numpy array, I get the following error message:
IndexError: index 1 is out of bounds for axis 1 with size 1

Additionally, once I get gp.conditional working, I would like to pass it as the center function to sample from a Normal distribution centered on the output of gp.conditional. I think the following code would be able to accomplish this, but I’m not sure how to sample from it while incorporating the previous trace?

with model:

    #sigma_obs and y_obs are both lists with len=1
    sigma = pm.Data('sigma', sigma_obs) 
    observed = pm.Data('observed', y_obs)
    log_l = pm.Normal('log_mb', mu=gp_mu, sigma=sigma,
                      observed=observed)

Thank you so much in advance for any and all help on this!

1 Like

It would be really great to get a GP emulator case study, this seems like such a great use case!

The good news is there’s no problem with using distributions as inputs for the GP, it’s just a shape issue here (which is why it’d a cool case study I think). X_new needs to have the shape (1, 2), so one row and two columns. It’s not smart enough to cast something shaped like (2, ) to the right thing. It reads this as a length two vector. Try replacing Xnew = ... with Xnew = at.stack(([dim1], [dim2])).T. You can check the shape like at.stack(([0.1], [0.2])).T.shape.eval(). There’s probably a cleaner way to define Xnew but this was the first thing I found that got me the shape (1, 2)

1 Like

@bwengals, thank you so much for your response! This is extremely helpful in getting gp.conditional to work. What would be the best way to then sample from the final log_l?

Oh sorry I missed that part of your question. You should be able to to just put all these things into same model block, prior and conditional and everything else, and then sample with NUTS like for any other model. .conditonal under the hood just adds another MvNormal into the model.

1 Like