Typical workflow in PyMC3


#1

Hi,

I’m still new to PyMC3, and hoping for some advice. I am doing a Bayesian fit using NUTS, to a model that is very heavy (i.e. time consuming to evaluate, both likelihood and gradient). I run it on computing nodes where I only have terminal access. Sometimes, especially while I’m testing the model, I need to be able to cancel it before it has completed the set number of steps. Therefore I’ve been running with the text backend option, so that it saves each step as a row in a text file, which I can then copy to my computer and load. But this seems to disable the energyplot() function, as the text backend does not store energy information (unless I’m missing something).

In browsing other questions here, I came across the pm.trace_to_dataframe(trace) function, which looks like another option for saving the trace to file. But that would only be run after the sampler is done, I suppose, and thus require the run to be carried to the end.

What are your typical workflows in PyMC3? Do you usually run on your own computer, maybe in an interactive session like iPython? Do you use backends, and if so which? How do you make sure to get all relevant information out of your run?

Thanks!


#2

Hi @jorgenem,

I usually work on my own computer with jupyter notebook. I dont use backend personally, but you can try the hdf5 backend (see mention here Using text backend raises sampler stats error).

If you are using NUTS, the pystan workflow is a good place to start (we also recommend similar checks). In general, I usually start with small model and simulation data with known parameters, and adding more complexity. Relevant information in terms of model fitting and model comparisons are documented within a notebook. For example, @AustinRochford’s recent blog post is an excellent example of a typical pymc3 workflow.


#3

Thanks so much @junpenglao! This is super useful info.