The way I understand them, in a Deep Gaussian Process, the output of each layer becomes the input to the previous layer. How does one predict on new points using these models? Do I need to call gp.conditional
on just the input?
I think you’d have to call gp.conditional for each of the GP layers and feed them into each other. I think it’d be super slow though. You could try using pm.gp.HSGP
instead of pm.gp.Latent
for this.
Yeah, I notice HSGP recently but don’t really understand it yet. I’ll have to look into that
@bwengals’s talk/notebook/event can be found here. I assume he was too modest to supply them himself.
Saw the talk a couple of days ago, I’ll take another look. On a related note, the GP API looks a bit strange to me. It seems to almost make predicting on new points an afterthought. From my understanding gp.conditional
needs to be called after pm.sample
. Which would mean if we try to do anything involved with gps (say apply a response function to them, or split the output) we’d need to redefine the variables. For example:
gps=[]
latents=[]
with pymc.Model() as model:
for i in range(...):
gp = pm.gp.Latent(mean_func=..., cov_func=...)
gps.append(gp)
f = gp.prior(f'f_{i}'...)
f = pytensor.stack(latents).T
another_f = pymc.math.exp(f)
y = pm.Dirichlet('y_obs', a=f, observed=...)
pm.sample()
...
To predict on new points we’d need quite some duplication
with model:
more_fs=[]
for gp in gps:
ff = gp.conditional(f'f_star_{i}', Xnew)
more_fs.append(ff)
f_star = pytensor.stack(more_fs).T
f_star_again = pymc.math.exp(f_star)
....
I know its a bit convoluted but that’s the best I can express it. For anything other that simple GPs there will be duplication. It’d be nice if we could just swap out the train inputs to make predictions the way it works for other models
No, I see your point. It is a bummer to have to rewrite anything that sits on top of the GP. That’s something that’s bugged us as well, because it would be nice if working with a GP felt more like working with a parametric function. @ricardoV94 has made progress in that direction based on substituting gp.conditional
for gp.prior
in the graph. We’re (at least I am!) still trying to think through some implications of this though.
The main challenge is that GPs are non-parametric. Predictions depend on the training data and the posterior of the GP over the training data. This is why predicting with GPs is different than say a standard linear regression model, which is parametric. All you need to predict on new data is the coefficient posteriors. Calling gp.conditional
after sampling is how one would do predictions for a GP “on paper”, which is why we went with that.
That said, the HSGP is a parametric approximation to the GP prior. Meaning you can do predictions using pm.set_data
. If you check the docstring for pm.gp.HSGP.prior_linearized
you’ll see an example of doing this.
The main limitation of HSGPs to be aware of is that they don’t scale in input dimension, so the number of columns of X
. They probably aren’t an option past p=3 or 4. Otherwise, with large n
the perform very well, you can work worth them as you would a linear model, and I don’t see why you couldn’t stack them into a deep GP type model.
thanks @cluhmann. Though I hear that many people are saying that it is the greatest PyMC HSGP notebook made for a PyCon 2022 talk to date.