# `conditional` and Deep Gaussian Processes

The way I understand them, in a Deep Gaussian Process, the output of each layer becomes the input to the previous layer. How does one predict on new points using these models? Do I need to call `gp.conditional` on just the input?

I think youâ€™d have to call gp.conditional for each of the GP layers and feed them into each other. I think itâ€™d be super slow though. You could try using `pm.gp.HSGP` instead of `pm.gp.Latent` for this.

1 Like

Yeah, I notice HSGP recently but donâ€™t really understand it yet. Iâ€™ll have to look into that

@bwengalsâ€™s talk/notebook/event can be found here. I assume he was too modest to supply them himself.

1 Like

Saw the talk a couple of days ago, Iâ€™ll take another look. On a related note, the GP API looks a bit strange to me. It seems to almost make predicting on new points an afterthought. From my understanding `gp.conditional` needs to be called after `pm.sample`. Which would mean if we try to do anything involved with gps (say apply a response function to them, or split the output) weâ€™d need to redefine the variables. For example:

``````gps=[]
latents=[]
with pymc.Model() as model:
for i in range(...):
gp = pm.gp.Latent(mean_func=..., cov_func=...)
gps.append(gp)
f = gp.prior(f'f_{i}'...)
f = pytensor.stack(latents).T
another_f = pymc.math.exp(f)
y = pm.Dirichlet('y_obs', a=f, observed=...)
pm.sample()
...
``````

To predict on new points weâ€™d need quite some duplication

``````with model:
more_fs=[]
for gp in gps:
ff = gp.conditional(f'f_star_{i}', Xnew)
more_fs.append(ff)
f_star = pytensor.stack(more_fs).T
f_star_again = pymc.math.exp(f_star)
....
``````

I know its a bit convoluted but thatâ€™s the best I can express it. For anything other that simple GPs there will be duplication. Itâ€™d be nice if we could just swap out the train inputs to make predictions the way it works for other models

No, I see your point. It is a bummer to have to rewrite anything that sits on top of the GP. Thatâ€™s something thatâ€™s bugged us as well, because it would be nice if working with a GP felt more like working with a parametric function. @ricardoV94 has made progress in that direction based on substituting `gp.conditional` for `gp.prior` in the graph. Weâ€™re (at least I am!) still trying to think through some implications of this though.

The main challenge is that GPs are non-parametric. Predictions depend on the training data and the posterior of the GP over the training data. This is why predicting with GPs is different than say a standard linear regression model, which is parametric. All you need to predict on new data is the coefficient posteriors. Calling `gp.conditional` after sampling is how one would do predictions for a GP â€śon paperâ€ť, which is why we went with that.

That said, the HSGP is a parametric approximation to the GP prior. Meaning you can do predictions using `pm.set_data`. If you check the docstring for `pm.gp.HSGP.prior_linearized` youâ€™ll see an example of doing this.

The main limitation of HSGPs to be aware of is that they donâ€™t scale in input dimension, so the number of columns of `X`. They probably arenâ€™t an option past p=3 or 4. Otherwise, with large `n` the perform very well, you can work worth them as you would a linear model, and I donâ€™t see why you couldnâ€™t stack them into a deep GP type model.

1 Like

thanks @cluhmann. Though I hear that many people are saying that it is the greatest PyMC HSGP notebook made for a PyCon 2022 talk to date.

2 Likes