Slow performance using Bambi compared to Example, please help!

I’m getting to grips with pymc and Bayesian modeling through running some of the Bambi Jupiter notebook worked examples…

I seem to be having real performance issues just fitting the Bambi glm models in the example. One of the models which seems to take 7 seconds to fit in the example takes over an hour on my VM, so something must be wrong! (And not with the model set up or code as it was ran directly from the example with the same data)

This is the example I’m running:

I’m using an Azure VM (Standard E4ads v5 (4 vcpus, 32 GiB memory))

Is this a matter of needing a better VM? (More vCPUs?) or is there something else that might have gone wrong in setup to cause such slow sampling?

Any help or suggestions on how to troubleshoot this and speed things up would be appreciated!

Hey there!

Could you share which Bambi version you’re using? I would suggest installing from the development version on Github.

Hey! Thanks for the quick reply. I’m using bambi v0.7.1? Are there some known issues with any versions?

I’ll give installing the dev version from GitHub a try as soon as possible and provide an update.

There’s only one known issue with predictions, but this is not connected to your problem.

In the meantime, you could also compare the results in the example notebook and the results you obtain in your VM. If their differ a lot, it may indicate a problem with the model specification (which shouldn’t be the case AFAIK)

1 Like

Hi, just check bambi (version 0.6.3 and 0.7.1) on Ms Azure notebook, and they run smoothly.

I use pip to install these specific verions:
pip install scipy==1.7.3 arviz==0.11.4 bambi==0.6.3 watermark
or
pip install scipy==1.7.3 arviz==0.11.4 bambi==0.7.1 watermark

The model fitting is around 30s-40s (less then 1 minute) in a small VM (Standard_D11_v2, 2 cores, 14 GB RAM, 100 GB disk).

Here is the watermark for the environment.

Last updated: Wed Apr 27 2022

Python implementation: CPython
Python version : 3.8.1
IPython version : 7.30.1

numpy : 1.18.5
arviz : 0.11.4
logging : 0.5.1.2
matplotlib : 3.2.1
statsmodels: 0.10.2
re : 2.2.1
pandas : 1.1.0
bambi : 0.6.3

Watermark: 2.3.0

1 Like

Sorry for the slow response, I’ve had limited time to work on this recently, but thankfully have made some progress!! I reinstalled the entire environment from scratch using the recommended versions of pymc and Bambi and now it seems to sample at a more reasonable speed (it only works on one core though).

I’ve been playing with some simple negative binomial models in Bambi, and have a question regarding the interpretation of the posterior predictive visualization in Arviz. I know my data is relatively zero inflated, so would expect to see posterior predictive values for zero (as per the observed).

What I see is this:

The posterior predictive mean is similar to the observed data for positive integers, but there appears to be no predictions for “0”. Is this to be expected? Is it just that the “0” predictions are not displayed on the default arviz plot_ppc for negative binomial distribution?

Any advice would be greatly appreciated… I’m still finding my feet here but have now got through “Bayesian analysis with python” and “Bayesian statistics for beginners”, which has at least given me a baseline level of knowledge to work with :slight_smile:

Hey again!

Could you share a reproducible example to investigate your problem? From the chart I see that your data has values like 0, 1, 2, 3, etc… but the model is predicting something quite different.

On top of that, I recommend updating to Bambi 0.8.0 which has been released a few days ago.

2 Likes