No, I see your point. It is a bummer to have to rewrite anything that sits on top of the GP. That’s something that’s bugged us as well, because it would be nice if working with a GP felt more like working with a parametric function. @ricardoV94 has made progress in that direction based on substituting gp.conditional for gp.prior in the graph. We’re (at least I am!) still trying to think through some implications of this though.
The main challenge is that GPs are non-parametric. Predictions depend on the training data and the posterior of the GP over the training data. This is why predicting with GPs is different than say a standard linear regression model, which is parametric. All you need to predict on new data is the coefficient posteriors. Calling gp.conditional after sampling is how one would do predictions for a GP “on paper”, which is why we went with that.
That said, the HSGP is a parametric approximation to the GP prior. Meaning you can do predictions using pm.set_data. If you check the docstring for pm.gp.HSGP.prior_linearized you’ll see an example of doing this.
The main limitation of HSGPs to be aware of is that they don’t scale in input dimension, so the number of columns of X. They probably aren’t an option past p=3 or 4. Otherwise, with large n the perform very well, you can work worth them as you would a linear model, and I don’t see why you couldn’t stack them into a deep GP type model.