Plotting a trace potentially causes my Python kernel to crash?

I’m not sure if this is worth reporting as a very minor error, but I recently fit a rather straightforward model (1 variable sampled, <500k data points, sampling done in under 10 minutes, with no warnings reported at all), and pm.traceplot(my_difficult_trace) simply does not work for its trace. It ends up taking up all the RAM in my system and crashing my kernel, as well as lagging my computer to a complete standstill, all within a few minutes.

The first issue is that the trace may be buggy in some sense, but I reran the model with no errors, and the new trace displayed the same behavior, not allowing pm.traceplot() to work. The second issue is how problematic it is that a difficult plot would be enough to freeze my computer within a few minutes if it was proving the slightest bit difficult, but I guess that’s a matplotlib error :sweat_smile:

Would someone examine the trace I am uploading, and see if they experience a similar error? Thank you all for your attention on this matter.

Link to the trace (mega seemed the most efficient way to share it) is here.

Code:

with pm.Model() as model:
    
    # Prior
    # The upper bound is the range of my dataset, multiplied by 5 to give a naive but simple diffuse prior
    sigma = pm.Uniform('sigma',
                       lower=0,
                       upper = 5 * (max(np.hstack(my_data[3])) - min(np.hstack(my_data[3]))) )    
    # Likelihood
    estimate = pm.Normal('estimate', mu = np.mean( np.hstack(my_data[3]) ), sigma = sigma,
                         observed = np.hstack(my_data[3]))
    
    trace = pm.sample(cores=4, n_init=20000, draws=2000, tune=8000)

Hi!
Did you try plotting the trace directly with ArviZ instead of PyMC: az.plot_trace(my_difficult_trace?

Also, from what I understand, you’re just sampling one parameter right? Is it multi-dimensional? If yes, this could cause the RAM saturation – then I’d try az.plot_trace(compact=True). This will plot multidimensional variables in a single plot.

Thanks for the info. I confirmed that az.plot_trace() works for my other traces, but the difficult one I have still does not plot, even if I set compact=True. I almost froze my whole OS just now testing that out, and I had to forcibly shut down my Python kernel once I noticed the session froze.

Pymc3 traces have literally never given me such trouble before, for just one variable (no matter how multi-dimensional they were).

Indeed that’s really weird… Sorry I can’t be of more help here – I’ve never encountered this kind of problem before.
Hopefully a more experienced PyMCer can help you out :slight_smile:

Can you transform your trace to InferenceData?

idata = az.from_pymc3(trace)

I tried doing that just now, as the very first thing after my relatively fast macbook pro was boot up in the morning, and the operation took all of my RAM in a few short minutes, virtually froze my computer, and crashed my kernel. :confused:

Does this mean my trace is just so dense, or something is buggy about it?

@colcarroll, @aloctavodia, @canyon289 do we have a way to stream straight to disk iteratively (from pymc3)?

I think that would mean you don’t have enough RAM to do copies.

My first guess that it has something to do with the size of the observations – 500k is pretty big if one of the conversion steps isn’t careful.

Also, the n_init argument in sampling I think only applies to initializing with ADVI. I would guess that you have 2,000 draws in 4 chains for 1 element (so 8,000 draws total?), and that PyMC3 is ignoring that argument? If it isn’t, that might be a problem.

This is curious! You might manually go through your sample statistics and see if one of them is super big…