I fit a GP - NBD model on 3 months worth of hourly data and I’m wondering what strategies people use to validate these models?
- Considering they’re (pseudo) non-parametric, I’d guess they’re more likely to overfit
- Since it’s a time series model, loo isn’t applicable
- It’s incredibly slow to fit (~3-4 hours), so a standard time series cross validation isn’t very feasible If I have 2 years worth of data to backtest (3 months of training data, 1 week horizon, so thats ~90 folds).
1) Are there any good alternatives to validating bayesian time series models that might be computationally more efficient?
2) Is there a chance to optimize the model I’ve specified below and decrease the training time?
with pm.Model() as model6:
cov_func_hour = pm.gp.cov.Periodic(1, 0.002736, ls=0.1) # hourly
gp_hour = pm.gp.Latent(cov_func=cov_func_hour)
cov_func_dow = pm.gp.cov.Periodic(1, 0.019165, ls=0.1) # daily
gp_dow = pm.gp.Latent(cov_func=cov_func_dow)
cov_trend = pm.gp.cov.ExpQuad(1, ls=0.08333)
gp_trend = pm.gp.Latent(cov_func=cov_trend)
gp = gp_hour + gp_dow + gp_trend
f = gp.prior('f', X=t)
alpha = pm.Exponential('alpha', 1)
obs = pm.NegativeBinomial('obs', mu=tt.exp(f), alpha = alpha, observed=y)
trace6 = pm.sample(1000, tune=1000)
Here’s is a slice of the posterior predictions.
Side question: These traces are huge (>136 MB) - any alternative suggestions for storing them?