Strategies for validating Gaussian Process models

I fit a GP - NBD model on 3 months worth of hourly data and I’m wondering what strategies people use to validate these models?

  • Considering they’re (pseudo) non-parametric, I’d guess they’re more likely to overfit
  • Since it’s a time series model, loo isn’t applicable
  • It’s incredibly slow to fit (~3-4 hours), so a standard time series cross validation isn’t very feasible If I have 2 years worth of data to backtest (3 months of training data, 1 week horizon, so thats ~90 folds).

1) Are there any good alternatives to validating bayesian time series models that might be computationally more efficient?

2) Is there a chance to optimize the model I’ve specified below and decrease the training time?

with pm.Model() as model6:

cov_func_hour =, 0.002736, ls=0.1) # hourly
gp_hour =

cov_func_dow =, 0.019165, ls=0.1) # daily
gp_dow =

cov_trend =, ls=0.08333) 
gp_trend =

gp = gp_hour + gp_dow + gp_trend

f = gp.prior('f', X=t)

alpha = pm.Exponential('alpha', 1)
obs = pm.NegativeBinomial('obs', mu=tt.exp(f), alpha = alpha, observed=y)
trace6 = pm.sample(1000, tune=1000)

Here’s is a slice of the posterior predictions.

Side question: These traces are huge (>136 MB) - any alternative suggestions for storing them?


Hi Kyle,
Thanks for the clean question.

I don’t know if it’s computationally more efficient, but maybe you can look into leave-future-out cross-validation (LFO)? Pinging @OriolAbril as he’s quite versed into these topics.

Your model and posterior retrodictive plot look good to me! I suppose you did some prior predictive checks to get to these priors? Can you explicit your choices for the kernel priors BTW?

One thing I’d try is adding predictors to your regression, to “relieve” your GP: right now, there is a possibility that your GP is trying to integrate many sources of variation and uncertainty in the data – sources that could be explained out by other, classical predictors, which would let the GP take less variation on its shoulders and hence fitting faster (I’m not sure I’m being clear here :grin:).

Also, keep in mind that GPs computational cost scales with the cube of the number of data points. Pinging @bwengals as our in-house GP expert :wink:

Using ArviZ’s to_netcdf is usually a good call, as you keep all the InferenceData goodies :wink:

Hope this helps :vulcan_salute:


Take a look at this paper to see if LFO is what you are searching for, from the post it does look like a good fit. It is not as efficient as PSIS-LOO (which generally requires no refits) but it could give you leave future out cross-validation results with only a handful of refits (not sure how many though).

LFO is not yet implemented in ArviZ, but the infrastructure to allow ArviZ to call PyMC3 and refit the model on subsets of the data is: see this other discourse answer and links there. I don’t have time right now to add LFO to ArviZ, but if you are interested in working on it I would gladly help.


This is fantastic thank you! I also never quite understood before that adding predictors can relieve the GP, but makes a ton of sense since it’s less variation you need to capture for each datapoint (and datapoint = parameter) of a GP!

I originally had Gamma priors as the Kernel priors, but I found that it changed the fitting time from 4 hours to 44 hours which wasn’t really feasible so I fit to a small subset and took the mean of the length-scale prior (I aggressively rounded on these as a v1, I should really improve that). Thinking back, I really should try fitting to fake data at this point to make sure the slow fitting time is due the nature of the model and not due to a misspecified model fitting to the data.

@OriolAbril I’ve got to be honest I’m relatively new to the field so I’m not the most qualified technically, but I am definitely interested in working on it, and it’s one of my long term goals to start contributing to open source (I just imagined I’d know way more before I started).

I just read the paper you linked and it was really manageable and the pseudocode looks straightforward - where do I get started!


I’d recommend getting started on contributing to ArviZ with a smaller issue to get familiar with the contributing process, PR, reviews and so on, and once you are familiar (one PR can already be enough, depends on the background of everybody) with that start working on LFO. In the meantime you can get familiar with the paper too and ask any questions about it here or in a new topic. I can also help in choosing a small feature that is related to your interests and the task at hand.

Regarding LFO, it should be added to like reloo, and the body of the lfo function should actually be quite similar and should use the same methods of the SamplingWrapper class as reloo. I will merge in the near future so all the updates to the base sampling wrapper are available in ArviZ development version. When starting to work on lfo we can chat here or in gitter to discuss the api and how to modularize the code. And don’t hesitate to contact me at any point of the process.

I actually came across this paper today while doing some research about time-series model comparison, was LFO-CV ever implemented in ArviZ? I found this discussion that seems to indicate it was, and you need to use the refitting wrappers to run it? Is there an example notebook or tutorial anywhere showing the functionality off?

1 Like

Sampling wrappers were already implemented a while ago, but so far only reloo uses them. If you are interested and available I’d be happy to help ypu implement that and review your PRs.

You can take a look at the current sampling wrappers example using reloo to get a feel of how they work (still not updated to pymc v4 though): Refitting PyMC3 models with ArviZ — ArviZ dev documentation

1 Like

Yeah I’m interested, I’ll add it to the docket and let you know when I get the ball rolling. I’ll need to review the refitting wrappers, I remember looking at it and struggling with the recomputing logp part. Though clicking on the link you provided, it seems like it’s been simplified?

1 Like

There is also this example: Refitting PyMC3 models with ArviZ (and xarray) — ArviZ dev documentation which is a bit more manual and might help clarify what is going on to obtain the pointwise log likelihoood values. I am not sure into which issues you ran, it is probably a bit easier than it was, but it is still an open issue, see for example Do not collapse timeseries logp across time dimension · Issue #5741 · pymc-devs/pymc · GitHub