Predicting on new data with gp.conditional

Hi,

My current models is as follows;

with pm.Model() as model:

  l = pm.Gamma("l", alpha=2, beta=1)
  offset = pm.Gamma("offset", alpha=2, beta=1)
  nu = pm.HalfCauchy("nu", beta=1)

  cov = nu ** 2 * pm.gp.cov.Polynomial(X.shape[1], l, 2, offset)

  gp = pm.gp.Marginal(cov_func=cov)

  sigma = pm.HalfCauchy("sigma", beta=1)
  y_ = gp.marginal_likelihood("y", X=X, y=Y, noise=sigma)

  map_trace = [pm.find_MAP()]
with model:
  f_pred = gp.conditional('f_pred', X_New)
with model:
  pred_samples = pm.sample_posterior_predictive(map_trace, vars=[f_pred], samples=2000)
  y_pred_custom, uncer = pred_samples['f_pred'].mean(axis=0), pred_samples['f_pred'].std(axis=0)

This works okay.

But, when I try to predict on some new data, say X_New2, it throws the error
Variable name f_pred already exists. from the gp.conditional statement.

I tried using data container (shared variables) for this, but I cannot seem to configure that properly.

Can someone point me in the right direction as to how I can use this model to predict on different data / datasets?

Thanks :slight_smile:

junpenglao
Read the forum a bit and thought you’d have some input on this. I just want to use this same model to get predictions over and over again. Do you have any thoughts on how to go about this? Thanks :slight_smile:

So what you’re doing here is drawing full samples from the GP, then calculating the pointwise mean and std:

pred_samples = pm.sample_posterior_predictive(map_trace, vars=[f_pred], samples=2000)
  y_pred_custom, uncer = pred_samples['f_pred'].mean(axis=0), pred_samples['f_pred'].std(axis=0)

This is unnecessary, since the mean and std at each point can be calculated analytically. Use gp.predict instead, it’ll be much faster, and you can dynamically specify Xnew's the way you want to. You can even provide it new observed data on the fly via the given argument (see here, here). Note this doesn’t re-optimize the hyperparameters for each new set of observations you provide, it merely estimates the GP on the new data using the already-optimized hyperparameters, which may or may not be what you want.

Now, if you do need full samples from the GP, a sort of hacky way is to just give each new variable a new name, e.g.:

with model:
  f_pred2 = gp.conditional('f_pred2', X_New2)
with model:
  pred_samples = pm.sample_posterior_predictive(map_trace, vars=[f_pred2], samples=2000)
  y_pred_custom, uncer = pred_samples['f_pred2'].mean(axis=0), pred_samples['f_pred2'].std(axis=0)

If you wanted to keep things programmatically simpler (though likely harder to interpret), you could keep the external variable name the same and just change the internal name:

with model:
  f_pred = gp.conditional('f_pred2', X_New2)
with model:
  pred_samples = pm.sample_posterior_predictive(map_trace, vars=[f_pred], samples=2000)
  y_pred_custom, uncer = pred_samples['f_pred2'].mean(axis=0), pred_samples['f_pred2'].std(axis=0)

That being said, I share your pain. It seems like there should be a way to overwrite variables inside a model, although perhaps the compiled nature of the model gets in the way. Or, if not that, then an ephemeral way to specify prediction points for drawing full samples (similar to gp.predict or the given input) would be really useful.

2 Likes

Thank you for this lengthy reply. Really appreciate it.

I was going over this for a long time looking for a way to implement this. Finally settled on using a ‘uuid’ for variable name, which changes with every iteration, just like you suggested. I will definitely try using this gp.predict method aswell.

Thanks again!

I’m able to use a theano.shared for X_new, does this not work?

X_New_shared = theano.shared(X_New)

with model:
  f_pred = gp.conditional('f_pred', X_New_shared, shape=(X_New.shape[0], )) # needed to specify shape

then run ppc sampling

with model:
  pred_samples = pm.sample_posterior_predictive(map_trace, vars=[f_pred], samples=2000)

then swap out the shared value

X_New_shared.set_value(different_X_New)

then rerun sample_posterior_predictive,

with model:
  pred_samples = pm.sample_posterior_predictive(map_trace, vars=[f_pred], samples=2000)

Is this what you meant? But yes, agree with @BioGoertz, it would be nice to overwrite variables. I think it must be possible, but I’m not sure.

2 Likes

Hi @bwengals

Should we not re-run

with model:
  f_pred = gp.conditional('f_pred', X_New_shared, shape=(different_X_New.shape[0], ))

every time I need to predict? However, everytime this function is run, it needs a new variable.

I don’t think you need to. If you’ve already specified the model, and you just want to keep changing X_new, but you still want predictions from gp, then i think using a shared var here will do the trick right?

2 Likes

That’s good to know! I always forget the power of shared variables…