Hi, I’m trying to fit a Gaussian Process to model the number of hurricanes in the Atlantic basin. Data is uploaded as csv.
Since we are counting data I use a Negative Binomial (another option would be a Poisson but I chose a Negative Binomial because mean and variance can be different). My try:
with pm.Model() as p_model:
l = pm.Exponential("l", 1)
sigma = pm.Exponential("sigma", 1)
cov = sigma**2 * pm.gp.cov.ExpQuad(1, ls=l) + pm.gp.cov.WhiteNoise(1E-5)
gp = pm.gp.Latent(cov_func=cov)
f = gp.prior("f", X)
μ = pm.Deterministic("μ", pm.math.exp(f))
alp = pm.Gamma("alp", alpha=2, beta=0.5)
observations = pm.NegativeBinomial("observations", alpha=alp, mu=μ,
observed=y)
trace = pm.sample(1500, tune=500, chains=2, target_accept=0.9)
X_new = np.linspace(np.min(huracanes.index.to_numpy().astype("int"))-2,
np.max(huracanes.index.to_numpy().astype("int"))+1,
50)[:,None]
with p_model:
f_pp = gp.conditional("f_pp", X_new)
pred_samples = pm.sample_posterior_predictive(trace, var_names=["f_pp"], samples=10)
_, ax = plt.subplots(figsize=(12,5))
print(pred_samples['f_pp'].shape)
mu = pred_samples['f_pp'].mean(axis=0)
std = pred_samples['f_pp'].std(axis=0)
ax.plot(X_new.flatten(), mu, 'C1')
ax.fill_between(X_new.flatten(),
mu - 2*std, mu + 2*std,
color="C1",
alpha=0.5)
ax.plot(huracanes.index, huracanes["Hurricanes"], 'ko')
ax.set_xlabel('Year')
ax.set_ylabel('# Hurricanes')
Adding white noise was essential to avoid being non positive definite.
But I obtain this very poor performance:
Tests of convergence, rhat, etc seem all to be ok. How can I improve this model? Thx.
hurricanes_atlantic.csv (1.4 KB)