Plotting a trace potentially causes my Python kernel to crash?

I’m not sure if this is worth reporting as a very minor error, but I recently fit a rather straightforward model (1 variable sampled, <500k data points, sampling done in under 10 minutes, with no warnings reported at all), and pm.traceplot(my_difficult_trace) simply does not work for its trace. It ends up taking up all the RAM in my system and crashing my kernel, as well as lagging my computer to a complete standstill, all within a few minutes.

The first issue is that the trace may be buggy in some sense, but I reran the model with no errors, and the new trace displayed the same behavior, not allowing pm.traceplot() to work. The second issue is how problematic it is that a difficult plot would be enough to freeze my computer within a few minutes if it was proving the slightest bit difficult, but I guess that’s a matplotlib error :sweat_smile:

Would someone examine the trace I am uploading, and see if they experience a similar error? Thank you all for your attention on this matter.

Link to the trace (mega seemed the most efficient way to share it) is here.

Code:

with pm.Model() as model:
    
    # Prior
    # The upper bound is the range of my dataset, multiplied by 5 to give a naive but simple diffuse prior
    sigma = pm.Uniform('sigma',
                       lower=0,
                       upper = 5 * (max(np.hstack(my_data[3])) - min(np.hstack(my_data[3]))) )    
    # Likelihood
    estimate = pm.Normal('estimate', mu = np.mean( np.hstack(my_data[3]) ), sigma = sigma,
                         observed = np.hstack(my_data[3]))
    
    trace = pm.sample(cores=4, n_init=20000, draws=2000, tune=8000)
1 Like

Hi!
Did you try plotting the trace directly with ArviZ instead of PyMC: az.plot_trace(my_difficult_trace?

Also, from what I understand, you’re just sampling one parameter right? Is it multi-dimensional? If yes, this could cause the RAM saturation – then I’d try az.plot_trace(compact=True). This will plot multidimensional variables in a single plot.

Thanks for the info. I confirmed that az.plot_trace() works for my other traces, but the difficult one I have still does not plot, even if I set compact=True. I almost froze my whole OS just now testing that out, and I had to forcibly shut down my Python kernel once I noticed the session froze.

Pymc3 traces have literally never given me such trouble before, for just one variable (no matter how multi-dimensional they were).

Indeed that’s really weird… Sorry I can’t be of more help here – I’ve never encountered this kind of problem before.
Hopefully a more experienced PyMCer can help you out :slight_smile:

Can you transform your trace to InferenceData?

idata = az.from_pymc3(trace)

I tried doing that just now, as the very first thing after my relatively fast macbook pro was boot up in the morning, and the operation took all of my RAM in a few short minutes, virtually froze my computer, and crashed my kernel. :confused:

Does this mean my trace is just so dense, or something is buggy about it?

@colcarroll, @aloctavodia, @RavinKumar do we have a way to stream straight to disk iteratively (from pymc3)?

I think that would mean you don’t have enough RAM to do copies.

My first guess that it has something to do with the size of the observations – 500k is pretty big if one of the conversion steps isn’t careful.

Also, the n_init argument in sampling I think only applies to initializing with ADVI. I would guess that you have 2,000 draws in 4 chains for 1 element (so 8,000 draws total?), and that PyMC3 is ignoring that argument? If it isn’t, that might be a problem.

This is curious! You might manually go through your sample statistics and see if one of them is super big…

This is probably the fault of how log likelihood variables are stored (growing lists inside loops) which is not very memory efficient.

I have created an issue in ArviZ to keep in mind this should be improved with some more details on the problem and possible solutions. There were similar issues with pymc3.sample_posterior_predictive, so fix of log_likelihood related issue could be based on that.

1 Like

@Gon_F We have just merged a PR that should fix this issue. Installing ArviZ development version should do it.

It optimizes memory usage and also adds a log_likelihood argument to from_pymc3 which allows to omit log likelihood data storage, which in your case even optimized can be quite challenging given that it is an array of shape (nchain, ndraw, 500k), if you do not need it (e.g. quantities like loo or waic are not necessary) you can just set log_likelihood=False.

Note: PyMC3 currently uses ArviZ for plotting and diagnostics. It is integrated so that plotting directly with traces works, however this is internally converting the trace to an inference data object which adds an overhead (in this case one of the tasks performed every time a conversion is done is creating this (nchain, ndraw, 500k) array). The recommended workflow would be the following:

with pm.Model() as model:
    
    # Prior
    # The upper bound is the range of my dataset, multiplied by 5 to give a naive but simple diffuse prior
    sigma = pm.Uniform('sigma',
                       lower=0,
                       upper = 5 * (max(np.hstack(my_data[3])) - min(np.hstack(my_data[3]))) )    
    # Likelihood
    estimate = pm.Normal('estimate', mu = np.mean( np.hstack(my_data[3]) ), sigma = sigma,
                         observed = np.hstack(my_data[3]))
    
    trace = pm.sample(cores=4, n_init=20000, draws=2000, tune=8000)
    idata = az.from_pymc3(trace)  # or  az.from_pymc3(trace, log_likelihood=False)

az.summary(idata)  # equivalent to pm.summary(idata)
az.plot_trace(idata)  # equivalent to pm.traceplot(idata) 
3 Likes