This might be a bit hacky, but the solution I’m currently working with is to define the Gaussian process as an additive GP where there are two components: a white noise component with my complicated mean model, and a Matern 3/2 component with a zero mean model. This allows me to call the white noise component separate from the Matern component, like so:
mumatern, varmatern = gp_matern.predict(xnew[:, None], point=trace[-1],
given={"gp": gp, "X": x[:, None], "y": y, "noise": yerr},
diag=True)
muwhite, varwhite = gp_white.predict(xnew[:, None], point=trace[-1],
given={"gp": gp, "X": x[:, None], "y": y, "noise": yerr},
diag=True)
muwhite then contains the mean model evaluated at trace[-1] and mumatern contains the GP trend. Is this sensible?