Jupyter kernel dies after sampling in Pymc3 training

Kalyan_Banik · December 29, 2021, 7:44am

Hello I am using Pymc3 for probabilistic modeling My notebook kernel dies after 100 percent completion of training and ask to restart
here is the code.

import warnings
warnings.filterwarnings(‘ignore’)
with model:
lambda_= pm.math.switch(tau>idx,lambda_1,lambda_2)
observation=pm.Poisson(‘obs’,lambda_,observed=count_dataframe)
step=pm.Metropolis()
trace=pm.sample(10000,tune=5000,step=step,return_inferencedata=False,cores=1)

This problem occurred also when I used default cores which are 4 and dealing 4 chains. I tried to figure it out by setting cores =1 then I also updated bellow package to avoid this error but didn’t work it out

~ conda update intel-openmp

I am new in Pymc3 kindly help me to solve this problem. I am using Mac OS Bigsur. theano-pymc 1.1.2
python version is 3.7.4. My data size is 1 miilion .

cluhmann · December 29, 2021, 4:29pm

Do you know if there are any warnings being generated (right now you are suppressing everything)? If so, seeing what they are might help diagnose.

Kalyan_Banik · December 29, 2021, 5:43pm

i faced py:226: runtimewarning: overflow encountered in exp at pymc3 . but i have converted the datatype at float128 that solved the issue. After solving that i am still facing this error

cluhmann · December 29, 2021, 9:50pm

Two things I might suggest. First, I would export the notebook and try running it as an executable python script. Second, if the error persists, I would provide any warnings that are generated (e.g., remove the filter you are currently using) as well as any error messages.

Kalyan_Banik · December 30, 2021, 4:47am

after removing suppress
RuntimeWarning: overflow encountered in exp
“accept”: np.exp(accept), this message is occured in pymc3/step_methods/metropolis.py:226:

cluhmann · December 30, 2021, 6:35am

To me, that warning suggests something fundamental is wrong with the model and/or the priors (i.e., I suspect that the sampler is finding that different parameter values all yield the same posterior). So I would break your model/data down into something manageable and make sure that everything is working the way you expect before trying to crank out 10K sample with a million data points. You can also try some prior predictive checks to see how your prior+model behaves in the absence of data.

Perhaps unrelatedly, I would allow pm.sample() to infer optimal step methods on its own:

trace=pm.sample(10000,
                tune=5000,
                return_inferencedata=False)

The sampler will automatically implement appropriate step methods for discrete parameters and continuous parameters (as well as combinations of discrete and continuous). In almost all cases, going with pymc’s defaults will be a better idea than crafting your own step methods.

OriolAbril · January 7, 2022, 5:04pm

I think the root of the kernel dying is a memory issue, but that doesn’t mean there aren’t other issues with your model too as @cluhmann was pointing.

You mentioned that you have 1 million datapoints, times the chains and draws (and even more so when using higher float resolution). By default pymc computes and stores the pointwise log likelihood data, which is used for model comparison which requires creating an array of this size (even if the model only has a handful of parameters). Try using:

pm.sample(..., return_inferencedata=True, idata_kwargs={"log_likelihood": False})

using return_inferencedata=False might work at first but then you’ll probably run into the same problem when trying to plot the results or computing ess/rhat.

Kalyan_Banik · January 10, 2022, 10:36am

well, i didn’t get any error and my kernel is running. There is a weird problem i am now facing

with pm.Model() as model:
    alpha=1/count_dataframe.mean()
    lambda_1=pm.Exponential('lambda_1',alpha)
    lambda_2=pm.Exponential('lambda_2',alpha)
    tau=pm.DiscreteUniform('tau',lower=0, upper=n_count_data -1)

i have already defined lambda_1 valiue and lambda_2 value and the value is

idx=np.arange(n_count_data)
print(lambda_1)

lambda_1 ~ Exponential

Then i have started my training like your style

with model:
    lambda_= pm.math.switch(tau>idx,lambda_1,lambda_2)
    observation=pm.Poisson('obs',lambda_,observed=count_dataframe)
    #step=pm.Metropolis()
    trace=pm.sample(10000,tune=5000,return_inferencedata=True,idata_kwargs={"log_likelihood": False})

up to this it is working fine
the problem has started from here
import matplotlib.pyplot as plt
#import matplotlib.pyplot.figsize

#trace
print(trace)
lambda_1_samples=trace['lambda_1']
print(lambda_1_samples)
#lambda_2_sample=trace['lambda_2']
#tau_samples= trace['tau']
#tau_samples
#lambda_2_samples

when i have run lambda_1_samples value and others it is showing
KeyError: ‘lambda_1’

I have already defined lambda_1 but why i am getting this error. Am i missing something here. i have printed trace value
here it. is

Inference data with groups:
	> posterior
	> sample_stats
	> observed_data

what wrong has done? May be i am missing something here

cluhmann · January 10, 2022, 2:49pm

Your approach would work with a MultiTrace object (which you get when you call pm.sample(return_inferencedata=False)). You have instead generated a trace that is stored in an InferenceData object (which will be the default starting in pymc v4). So you need to do:

lambda_1_samples=trace.posterior['lambda_1']

Topic		Replies	Views
Kernel disconnects/crashes randomly when looping through models Questions	12	3713	September 29, 2018
When ever I tried running NUTS sampling my notebook gets disconnected from kernel Questions	4	1864	July 7, 2018
Weird error: 'The kernel appears to have died. It will restart automatically' Questions theano	5	1609	March 12, 2023
I can't get pymc3 to work Questions bug	4	2602	January 15, 2021
Problem with multiprocessing in PyMC3 Questions	5	3717	August 20, 2018

Jupyter kernel dies after sampling in Pymc3 training

Related topics