Predicting on new data with gp.conditional

Ravindu_Fernando · November 8, 2020, 8:17am

Hi,

My current models is as follows;

with pm.Model() as model:

  l = pm.Gamma("l", alpha=2, beta=1)
  offset = pm.Gamma("offset", alpha=2, beta=1)
  nu = pm.HalfCauchy("nu", beta=1)

  cov = nu ** 2 * pm.gp.cov.Polynomial(X.shape[1], l, 2, offset)

  gp = pm.gp.Marginal(cov_func=cov)

  sigma = pm.HalfCauchy("sigma", beta=1)
  y_ = gp.marginal_likelihood("y", X=X, y=Y, noise=sigma)

  map_trace = [pm.find_MAP()]

with model:
  f_pred = gp.conditional('f_pred', X_New)

with model:
  pred_samples = pm.sample_posterior_predictive(map_trace, vars=[f_pred], samples=2000)
  y_pred_custom, uncer = pred_samples['f_pred'].mean(axis=0), pred_samples['f_pred'].std(axis=0)

This works okay.

But, when I try to predict on some new data, say X_New2, it throws the error
Variable name f_pred already exists. from the gp.conditional statement.

I tried using data container (shared variables) for this, but I cannot seem to configure that properly.

Can someone point me in the right direction as to how I can use this model to predict on different data / datasets?

Thanks

Ravindu_Fernando · November 9, 2020, 5:23am

junpenglao
Read the forum a bit and thought you’d have some input on this. I just want to use this same model to get predictions over and over again. Do you have any thoughts on how to go about this? Thanks

BioGoertz · November 9, 2020, 11:14am

So what you’re doing here is drawing full samples from the GP, then calculating the pointwise mean and std:

pred_samples = pm.sample_posterior_predictive(map_trace, vars=[f_pred], samples=2000)
  y_pred_custom, uncer = pred_samples['f_pred'].mean(axis=0), pred_samples['f_pred'].std(axis=0)

This is unnecessary, since the mean and std at each point can be calculated analytically. Use gp.predict instead, it’ll be much faster, and you can dynamically specify Xnew's the way you want to. You can even provide it new observed data on the fly via the given argument (see here, here). Note this doesn’t re-optimize the hyperparameters for each new set of observations you provide, it merely estimates the GP on the new data using the already-optimized hyperparameters, which may or may not be what you want.

Now, if you do need full samples from the GP, a sort of hacky way is to just give each new variable a new name, e.g.:

with model:
  f_pred2 = gp.conditional('f_pred2', X_New2)

with model:
  pred_samples = pm.sample_posterior_predictive(map_trace, vars=[f_pred2], samples=2000)
  y_pred_custom, uncer = pred_samples['f_pred2'].mean(axis=0), pred_samples['f_pred2'].std(axis=0)

If you wanted to keep things programmatically simpler (though likely harder to interpret), you could keep the external variable name the same and just change the internal name:

with model:
  f_pred = gp.conditional('f_pred2', X_New2)

with model:
  pred_samples = pm.sample_posterior_predictive(map_trace, vars=[f_pred], samples=2000)
  y_pred_custom, uncer = pred_samples['f_pred2'].mean(axis=0), pred_samples['f_pred2'].std(axis=0)

That being said, I share your pain. It seems like there should be a way to overwrite variables inside a model, although perhaps the compiled nature of the model gets in the way. Or, if not that, then an ephemeral way to specify prediction points for drawing full samples (similar to gp.predict or the given input) would be really useful.

Ravindu_Fernando · November 9, 2020, 11:28am

Thank you for this lengthy reply. Really appreciate it.

I was going over this for a long time looking for a way to implement this. Finally settled on using a ‘uuid’ for variable name, which changes with every iteration, just like you suggested. I will definitely try using this gp.predict method aswell.

Thanks again!

bwengals · November 9, 2020, 11:28pm

I’m able to use a theano.shared for X_new, does this not work?

X_New_shared = theano.shared(X_New)

with model:
  f_pred = gp.conditional('f_pred', X_New_shared, shape=(X_New.shape[0], )) # needed to specify shape

then run ppc sampling

with model:
  pred_samples = pm.sample_posterior_predictive(map_trace, vars=[f_pred], samples=2000)

then swap out the shared value

X_New_shared.set_value(different_X_New)

then rerun sample_posterior_predictive,

with model:
  pred_samples = pm.sample_posterior_predictive(map_trace, vars=[f_pred], samples=2000)

Is this what you meant? But yes, agree with @BioGoertz, it would be nice to overwrite variables. I think it must be possible, but I’m not sure.

Ravindu_Fernando · November 10, 2020, 3:46am

Hi @bwengals

Should we not re-run

with model:
  f_pred = gp.conditional('f_pred', X_New_shared, shape=(different_X_New.shape[0], ))

every time I need to predict? However, everytime this function is run, it needs a new variable.

bwengals · November 10, 2020, 4:32am

I don’t think you need to. If you’ve already specified the model, and you just want to keep changing X_new, but you still want predictions from gp, then i think using a shared var here will do the trick right?

BioGoertz · November 10, 2020, 7:57pm

That’s good to know! I always forget the power of shared variables…

Topic		Replies	Views
GP Predict Point Questions	9	1011	March 6, 2020
GP conditional prediction when GP was construced inside user defined function v5 modeling	3	178	November 27, 2023
Problem when using shared variable and linear mean function with GP conditional Questions	10	774	December 16, 2020
`conditional` and Deep Gaussian Processes v5 gaussian_process	6	970	April 17, 2023
Improving design of conditional / predictive sampling from GPs Development	1	581	May 4, 2018

Predicting on new data with gp.conditional

Related topics