You’re on the right track. The mean function and the variance both have to be gp.Latent. Try:
# Level 2, instead of Marginal
gp_l = pm.gp.Latent(...)
gp_sig = pm.gp.Latent(...)
# and then actually construct the GP
lengthscale = gp_l.prior("lengthscale", X=Xtrain[:,None])
variance = gp_sig.prior("variance", X=Xtrain[:,None])
# Level 1
cov_obs = tt.outer(variance, variance) * pm.gp.cov.Gibbs(1, lengthscale)
You can check the docs to read about the difference between Marginal and Latent. In short, Marginal assumes the observation noise is Gaussian, and the GP f is integrated out,
p(y | X) = \int p(y \mid f) p(f \mid X) df
where, Latent makes no assumptions about the observation noise, and f isn’t integrated out. So that assumption applies to gp_obs, but not gp_sig or gp_l.