Hi, I’m having some trouble using Pymc3 . I have a model that depends on marginalizing over certain parameters, in such a way that it isn’t differentiable with respect to them (or at least, they didn’t play well with Theano when I tried, and I don’t know enough about Theano to attempt to fix it). Therefore, I have to create a whole bunch of different models using loops, and save them all to file for later aggregation and analysis. I know that this isn’t exactly “best practices” but it was all I could think to do.
My problem is that at random times, the kernel will just die completely, and is unable to be restarted. This is a huge problem because typically the whole process takes many hours, and I can’t just sit around babysitting it for when it breaks.
Here’s the raw data for the notebook (copypaste and save as .ipynb), and here’s the data csv it references. I’ve reproduced the problematic loop part here:
latency_mean = 12
latency_var = 5.433
latency_parameters = lognormal_parameters(latency_mean, latency_var)
stay_at_home = 1
saturated = False
n_samples = 500
n_tune = 1500
n_chains = 4
model_name = "model_sah_"+str(int(stay_at_home))+"_sat_"+str(int(saturated))+"_"
marginalization_mc = 100
for i in range(marginalization_mc):
print(i)
outbreak = Observed_Outbreak(patients_data)
outbreak.impute_latencies(latency_parameters)
outbreak.get_global_infectious()
outbreak.get_susceptible_states()
outbreak.get_group_infectious("Household_ID")
outbreak.get_group_infectious("Family_ID")
outbreak.get_group_infectious("Sex")
outbreak.get_classroom_infectious(1, infectious_override=stay_at_home)
outbreak.get_classroom_infectious(2, infectious_override=stay_at_home)
community_slab = outbreak.slabs["Community"]
c1_slab = outbreak.slabs["Classroom_1"]
c2_slab = outbreak.slabs["Classroom_2"]
household_slab = outbreak.slabs["Household_ID"]
family_slab = outbreak.slabs["Family_ID"]
sex_slab = outbreak.slabs["Sex"]
s_before = outbreak.states_before
s_after = outbreak.states_after
this_model = pm.Model()
with this_model:
rv_q_community = pm.Uniform("rv_q_community")
##rv_q_class_0 = pm.Uniform("rv_q_class_0")
rv_q_class_1 = pm.Uniform("rv_q_class_1")
rv_q_class_2 = pm.Uniform("rv_q_class_2")
rv_q_household = pm.Uniform("rv_q_household")
rv_q_family = pm.Uniform("rv_q_family")
rv_q_sex = pm.Uniform("rv_q_sex")
probabilities = ((rv_q_community**community_slab) *
(rv_q_class_1**c1_slab) *
(rv_q_class_2**c2_slab) *
(rv_q_household**household_slab) *
(rv_q_family**family_slab) *
(rv_q_sex**sex_slab))
## probability is zero if already not susceptible
probabilities = s_before * probabilities
s_observed = pm.Bernoulli("s_observed", p=probabilities, observed=s_after)
this_trace = pm.sample(draws=n_samples, cores=n_chains,
chains=n_chains, tune=n_tune)
##pm.traceplot(this_trace)
##print("Likelihood sampled using MCMC NUTS:")
##plt.show()
print(pm.summary(this_trace))
now = datetime.now().strftime("%Y-%m-%dT%H-%M-%S")
dump_filename = "models/"+model_name+now+".pkl"
os.makedirs(os.path.dirname(dump_filename), exist_ok=True)
model_parameters = {"latency_parameters": latency_parameters,
"stay_at_home": stay_at_home,
"saturated": saturated,
"n_samples": n_samples,
"n_tunes": n_tune,
"n_chains": n_chains}
with open(dump_filename, "wb") as the_file:
pickle.dump({"model_parameters": model_parameters,
"latencies": outbreak.data_frame["Latency_Onset"],
"model": this_model,
"trace": this_trace}, the_file)
print(f"Model successfully dumped to file: {dump_filename}.")
print("Marginalization finished.")
I stress that ordinarily the model works absolutely fine – when I run this just a single time outside of a loop, it always samples properly and I get sensible posterior distributions for the parameters. The problem comes when I loop it many times. Typically it will break partway through sampling, e.g. here’s where it got to last time:
2
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [rv_q_sex, rv_q_family, rv_q_household, rv_q_class_2, rv_q_class_1,
rv_q_community]
Sampling 4 chains: 0%| | 14/8000 [00:01<24:22, 5.46draws/s]
After that, kernel crash. It doesn’t even give an error message. Sometimes it manages dozens of loops successfully, other times it breaks after the first few.
I’m using pymc3 version 3.5, installed using “conda install -c conda-forge pymc3”. Here’s a full list of my installed packages. How do I fix this?