Memory spike at the end of the MCMC sampling

Awesome! Thanks @OriolAbril for the fast and informative reply.

I will try this out and see how I get on.

fwiw, the loo package in R has a loo.function() method for dealing with this issue. For large datasets loo etc can be calculated iteratively by evaluating loo for each observation in the dataset and summing them as it goes. This avoids having to store the entire pointwise log likelihood matrix in memory at any one time. It is slower, but allows one to avoid the memory issue. Not sure it something similar could be an option for the arviz loo but thought I mention it just incase. Obviously it would mean having the log likelihood function definition for a single observation (not sure that this can be derived from the model so maybe it would have to be explicitly specified by the user), and it also requires having the data stored or provided by the user when they go to evaluate loo. So might not be ideal within the pymc3 / arviz framework. Maybe Dask is a more viable solution.

2 Likes