Pm.forestplot and pm.traceplot very slow

I updated pymc3 and suddenly traceplot and forestplot take forever (I can plot the same plots manually using seaborn, etc. in order of magnitude faster time)

Also, I noticed the syntax seems to have changed? Where before my plots would plot multiple variables in the case of b.shape==3, there would be 3 plots, now there is only 1.

Is there somewhere such changes would be documented?

Hi,
PyMC plotting is now handled by ArviZ, so you should find more info on the website.

Regarding your question, it’s hard to answer without more info, but if your model has a lot of dimensions, using compact=True is usually recommended.
Also, what versions of PyMC3 and ArviZ are you running on?

2 Likes

PyMC has outsourced the plotting to ArviZ which uses it’s own internal data structure. The time spent plotting is likely due to the conversion from the PyMC trace into the ArviZ InferenceData object. You can do this once and then use that object for plotting. Multiple plots for a multi-shaped variable is controlled by compact

import arviz as az
with pm.Model() as model:      
   .....    
   trace = pm.sample()
   idata = az.from_pymc3(trace)


az.plot_trace(idata, compact=True);
4 Likes

Thank you @AlexAndorra and @nkaimcaudle

I found the documentation on Arviz website specifically talking about handling pymc3 traces, and it explains really thoroughly. Looks like very cool functionality!

https://arviz-devs.github.io/arviz/notebooks/Introduction.html

Hmm… I don’t get why I would have thought compact=True would mean the opposite…
It seems like all the RVs combined into one would be more “compact”

Nevermind! I get it now
Compact means that instead of on 3 rows, it plots 3 plots on 1 row

Somehow I needed to restart interpreter because it was only showing 1 row both times.

I’m a little late to the party, I just want to add some minor notes on the previous answers.

The issue of having only one plot instead of the expected three (I’d guess only the first is shown) sounds similar to https://github.com/arviz-devs/arviz/issues/1023. It has already been fixed though, please make sure you are using ArviZ latest version.

Regading plotting functions be slow, as pointed out above, this is due to the conversion to inference data having to be performed every time a plotting function is called, and it can be solved by using the inference data object straight away. The conversion should is not too lengthy in general. In many cases the bottleneck of the conversion is extracting log likelihood data if there are a lot of observed values. This could also require a significant part of your RAM memory unnecessarily, for trace plots and forestplots log likelihood is not needed. If you are sure log likelihood data is not needed, you can use log_likelihood=False when calling from_pymc3. We don’t have enough info available to know if this applies to your case, but I figured I’d be worth sharing anyway.

2 Likes