To add to what @jessegrabowski said, the prior on lengthscale has a huge effect on sampling time, like 10 minutes to 1 day huge isn’t unreasonable. But ~700 data points really should be doable IMO. The problem with the Gamma(0.5, 0.2) prior is that it puts a lot of mass at very small lengthscales, below the resolution of your data. Say your x data is on a rectangular grid, 1 “unit” apart, as in x[1] - x[0] = 1. Roughly speaking, lengthscale values smaller than 1 (though it’s not a strict boundary) are below the resolution of your data. When this happens the GP starts to turn into a model of IID Gaussian noise – which is already covered by your likelihood.
As a starting point, you could try using an inverse gamma prior. It has a nice property where it has low mass at small values. Using this code, set your lower value carefully, it should at least be the resolution of your x data.
params_ls = pm.find_constrained_prior(
distribution=pm.InverseGamma,
lower=2,
upper=10,
init_guess={"alpha": 2, "beta": 10},
mass=0.9,
)
d = pm.InverseGamma.dist(**params_ls)
s = pm.draw(d, 10_000)
plt.hist(s, 100);
I would double check the distribution at the end because every now and then pm.find_constrained_prior will return something that’s not quite right without erroring.
This all applies btw whether or not you’re using HSGP or regular GP. Something to watch out for with HSGPs on spatial / time series models is that it’s not amazing for super small lengthscales, relative to the overall width of your x data. Or in other words for modeling very local relationships between data. To do this you start needing m to get really high and you lose the speed gains.