Help fitting a GP model that performs poorly

Hi, I’m trying to fit a Gaussian Process to model the number of hurricanes in the Atlantic basin. Data is uploaded as csv.

Since we are counting data I use a Negative Binomial (another option would be a Poisson but I chose a Negative Binomial because mean and variance can be different). My try:

with pm.Model() as p_model:
    l = pm.Exponential("l", 1)
    sigma = pm.Exponential("sigma", 1)
    cov = sigma**2 *, ls=l) +
    gp =
    f = gp.prior("f", X) 

    μ = pm.Deterministic("μ", pm.math.exp(f))
    alp = pm.Gamma("alp", alpha=2, beta=0.5)
    observations = pm.NegativeBinomial("observations", alpha=alp, mu=μ,
    trace = pm.sample(1500, tune=500, chains=2, target_accept=0.9)

X_new = np.linspace(np.min(huracanes.index.to_numpy().astype("int"))-2,

with p_model:
    f_pp = gp.conditional("f_pp", X_new)
    pred_samples = pm.sample_posterior_predictive(trace, var_names=["f_pp"], samples=10)

_, ax = plt.subplots(figsize=(12,5))
mu = pred_samples['f_pp'].mean(axis=0)
std = pred_samples['f_pp'].std(axis=0)

ax.plot(X_new.flatten(), mu, 'C1')
                 mu - 2*std, mu + 2*std,
ax.plot(huracanes.index, huracanes["Hurricanes"], 'ko')
ax.set_ylabel('# Hurricanes')

Adding white noise was essential to avoid being non positive definite.

But I obtain this very poor performance:

Tests of convergence, rhat, etc seem all to be ok. How can I improve this model? Thx.
hurricanes_atlantic.csv (1.4 KB)

So f_pp is coming from gp.conditional, so it’s the forecast of f, try plotting exp(f_pp)

Thanks, you’re of course correct. I missed that. As an extra, I think that a wigglier covariance, like a Matern, would be better than an ExpQuad.

1 Like