@chartl and @junpenglao – I looked over the notebook about Sequential Monte Carlo, and it was very helpful, but it left me with a new question:
The discussion in the notebook is all about sampling the exact same model repeatedly with restarts. But the issue for me, one of those discussed in the “Scaling Bayes” paper, is that I get data in multiple tranches over time.
Conceptually, at least, this is not an issue, because the samples are i.i.d. (well, there’s a complication, but let’s ignore that for now).
However, there’s a programming complication, which is that PyMC3 models have their observations “baked in,” so I can’t just train the model on subset one of the observations, and then move on to subset 2 – for that I would have to rebuild the model, and then the old traces wouldn’t apply to the new model.
My guess is that it might be possible to effectively rip out the observations from the trace for subset 1, build a new model that is the same except for a new observed node, and restart. But I don’t believe the API for traces would easily support this.
Alternatively, could we build a new model and use the end point of the old chains as start points for new chains, and then merge the traces, after removing the observed node?
P.S. Things are a little more complicated, in my actual case, because I have a mixture of Gaussians model with multiple conditioning variables, so the model would need more extensive surgery. But if I can’t figure out how to answer the above questions, I can’t begin to worry about this complication.