# Help fitting a GP model that performs poorly

Hi, I’m trying to fit a Gaussian Process to model the number of hurricanes in the Atlantic basin. Data is uploaded as csv.

Since we are counting data I use a Negative Binomial (another option would be a Poisson but I chose a Negative Binomial because mean and variance can be different). My try:

``````with pm.Model() as p_model:

l = pm.Exponential("l", 1)
sigma = pm.Exponential("sigma", 1)

cov = sigma**2 * pm.gp.cov.ExpQuad(1, ls=l) + pm.gp.cov.WhiteNoise(1E-5)

gp = pm.gp.Latent(cov_func=cov)
f = gp.prior("f", X)

μ = pm.Deterministic("μ", pm.math.exp(f))

alp = pm.Gamma("alp", alpha=2, beta=0.5)

observations = pm.NegativeBinomial("observations", alpha=alp, mu=μ,
observed=y)

trace = pm.sample(1500, tune=500, chains=2, target_accept=0.9)

X_new = np.linspace(np.min(huracanes.index.to_numpy().astype("int"))-2,
np.max(huracanes.index.to_numpy().astype("int"))+1,
50)[:,None]

with p_model:
f_pp = gp.conditional("f_pp", X_new)
pred_samples = pm.sample_posterior_predictive(trace, var_names=["f_pp"], samples=10)

_, ax = plt.subplots(figsize=(12,5))
print(pred_samples['f_pp'].shape)
mu = pred_samples['f_pp'].mean(axis=0)
std = pred_samples['f_pp'].std(axis=0)

ax.plot(X_new.flatten(), mu, 'C1')
ax.fill_between(X_new.flatten(),
mu - 2*std, mu + 2*std,
color="C1",
alpha=0.5)
ax.plot(huracanes.index, huracanes["Hurricanes"], 'ko')
ax.set_xlabel('Year')
ax.set_ylabel('# Hurricanes')
``````

Adding white noise was essential to avoid being non positive definite.

But I obtain this very poor performance:

Tests of convergence, rhat, etc seem all to be ok. How can I improve this model? Thx.
hurricanes_atlantic.csv (1.4 KB)

So `f_pp` is coming from `gp.conditional`, so it’s the forecast of `f`, try plotting `exp(f_pp)`

Thanks, you’re of course correct. I missed that. As an extra, I think that a wigglier covariance, like a Matern, would be better than an ExpQuad.

1 Like