What does pm.sample do right after sampling? Why does it take so long?


#1

I left the computer running the Metropolis sampler. After 5hs (I wanted many samples because why not) the computer finishes sampling but remains busy still for more the 10hs. What is PyMC3 doing?

Screenshot_2018-03-02_13-41-30

There’s no doubt It is actually doing something because there is an active process running (using one core of the cpu at 100%):

I noticed that there is always something being done after sampling (reformating data?) but It never took this long. I’m curious to know why this takes longer than the sampling itself and If there is a way to prevent It from taking so long.


#2

If you are on master, you might want to turn off compute_convergence_checks: setting compute_convergence_checks=False in pm.sample(...), the current implementations of one of the convergence checks (effective sample size) could take some time when you have a > 10000 samples.

Another reason, although it is unlikely in Metropolis sampler, is that the first chain finish sampling, but there are still chains sampling in the background. Currently, only the progress bar of the first chain is shown.


#3

I disabled convergence checks but it still remains busy after reaching 100%. I didn’t try as long a a sample as in the original post so I can’t compare quantitatively. Now I’m working with much smaller samples and the delay time is not a problem anymore. Maybe some chains are still working in the background as you say, that kinda makes sense with what I see in the system monitor. Thanks anyways.