Paper: Exploring various Bayesian Inference techniques with Facebook Prophet

Hi everyone,

Me and my professor published a paper recently at a local student conference (it is used mostly by students in order to practice writing papers). In the scope of our research, we explored various inference techniques with Facebook Prophet. In order to do that, we re-implemented Facebook Prophet from scratch in PyMC.

The conference allowed us to post the paper on arxiv. We also published the code of our reimplementation on github, and there you can find more details about what we are actually researching (keep in mind that the github repository is currently out of date and some of the transfer learning ideas implemented there are incorrect - I need to update the repository once we complete and find a venue for our next paper that documents our research).

Since Bayesian statistics is a bit of a niche subject at our university (Ss. Cyril and Methodius University in Skopje, Faculty of Computer Science and Engineering), we did not get a lot of feedback at the conference. So, if anyone is interested in this paper, I would love to get some feedback. Or, if anyone is interested in the code I shared on github, I would love to get some feedback for that too.

I hope that I will share our final paper and the complete Vangja package with you soon!

Thanks a lot!

1 Like

I believe PyMC is able to produce JAX code, at which point you can run through all the inference algorithms in Blackjax. I’d also recommend trying Nutpie, which is much faster than Stan’s strategy for NUTS adaptation.

I took a look at the paper. It’s great you published it. I just wanted to issue a warning about relying too heavily on R-hat and ESS estimates, as they’re only really valid in cases where the sampler is mixing well. For example, if you fit something like Neal’s funnel with NUTS, the R-hat and ESS look great, but if you compare against the analytically known posterior for the log scale parameter, you’ll see it’s not actually exploring either the neck or the tail as it should (i.e., it’s trivial to reject the hypothesis that it’s properly mixing as we know what the true rate is). You can also apply something like simulation-based calibration to see if your sampler is working.

You also have to be careful about what’s going on in the performance comparisons in section C. There are two separate problems, the first of which is whether you’re sampling correctly from the posterior, and the second of which is whether your model is good and therefore makes good predictions. It can be the case that you can choose a poorer sampler (e.g., ADVI) and it will make better predictions because it’s not actually fitting the model you gave it. I like to tease those two issues apart in evaluations to the extent possible.

Finally, it doesn’t make sense to compare ESS without consider compute cost, because we can get ESS as high as we want by running more iterations, assuming we have an ESS > 0 to begin with (for real, not as measured by noisy ESS estimators).

If you’re really interested in ADVI, I’d highly recommend Agrawal and Domke, which is from Abhinav’s dissertation work, and it’s mostly about normalizing flows, but it evaluates a really good ADVI baseline. They introduce several tricks for stabilizing ADVI here and in their previous paper that would be well worth implementing (though taking large numbers of draws to evaluate the ELBO would be prohibitive on your hardware—you really need a good GPU for this).

2 Likes