My kernel keeps dying when trying to run this code, any ideas why?
Any help is greatly appreciated
I get the error message:
" Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure."
Code is below with example dataset attached.
import arviz as az
import numpy as np
import pandas as pd
import pymc as pm
import xarray as xr
from scipy.stats import multivariate_normal
df = pd.read_csv("swim_example_data.csv")
IDs_idx, IDs = pd.factorize(df.full_name_computed)
coords = {
"IDs":IDs
}
def censored_log_likelihood_ind_name_(mu, log_sigma, standardized, uncensored):
if np.sum(uncensored) == len(uncensored): ## if no times are censored
llh = pm.logp(pm.MvNormal.dist(mu=np.zeros(np.sum(uncensored)), cov=np.diag(np.ones(np.sum(uncensored)))), standardized)
else:
left_censored_name = (1-uncensored).astype(bool)
llh = multivariate_normal.logcdf(
standardized[left_censored_name].eval(),
np.zeros(np.sum(left_censored_name)), np.diag(np.ones(np.sum(left_censored_name))))
llh += pm.logp(pm.MvNormal.dist(mu=np.zeros(np.sum(uncensored)), cov=np.diag(np.ones(np.sum(uncensored)))),
standardized[uncensored])
return llh
with pm.Model(coords=coords) as rnd_effects_cens_loop:
mu = pm.Normal("mu", 0,1, dims = ["IDs"])
log_sigma = pm.Normal("log_sigma", np.log(0.5))
for id_ in IDs:
df_id = df[df.full_name_computed == id_]
## standardize the data so that each ID is on Normal(0,1)
standardized = np.array((df_id.Gaus_censored.values - mu[id_])/np.exp(log_sigma))
uncensored = df_id.uncensored
pm.Potential(id_, censored_log_likelihood_name_(mu[id_], log_sigma, standardized, uncensored))
I would begin by pulling your code out of the notebook and running it as a script (as suggested here). Doing so yields much more informative error messages, such as this:
File ~/mambaforge/envs/pymc/lib/python3.10/site-packages/aesara/tensor/var.py:48
8, in _tensor_py_operators.__getitem__(self, args)
480 index_dim_count += arg.ndim
481 else:
482 # Python arrays can contain a mixture of bools and integers,
483 # which requires complex rules to handle all special cases.
(...)
486 # indexing, it is safe to throw an error if we encounter
487 # any of these difficult cases.
--> 488 if includes_bool(arg):
489 raise TypeError(
490 "TensorType does not support Python bools "
491 "for indexing, such as tensor[[True, False]]. "
(...)
494 "tensor[numpy.array([True, False])]."
495 )
496 index_dim_count += 1
Looking at your code, you are trying to index into the parameter array mu using id_ and id_ is a string. So I suspect this is where the error is originating.
Not sure why you arenโt seeing any error messages, but here is (part of) what I get running your new code. So it looks like pymc is choking on the initial starting point.
File ~/mambaforge/envs/pymc/lib/python3.10/site-packages/pymc/sampling.py:2381, in _init_jitter(model, initvals, seeds, jitter, jitter_max_retries)
2379 if i < jitter_max_retries:
2380 try:
-> 2381 model.check_start_vals(point)
2382 except SamplingError:
2383 # Retry with a new seed
2384 seed = rng.randint(2**30, dtype=np.int64)
File ~/mambaforge/envs/pymc/lib/python3.10/site-packages/pymc/model.py:1791, in Model.check_start_vals(self, start)
1785 valid_keys = ", ".join(self.named_vars.keys())
1786 raise KeyError(
1787 "Some start parameters do not appear in the model!\n"
1788 f"Valid keys are: {valid_keys}, but {extra_keys} was supplied"
1789 )
-> 1791 initial_eval = self.point_logps(point=elem)
1793 if not all(np.isfinite(v) for v in initial_eval.values()):
1794 raise SamplingError(
1795 "Initial evaluation of model at starting point failed!\n"
1796 f"Starting values:\n{elem}\n\n"
1797 f"Initial evaluation results:\n{initial_eval}"
1798 )
I would spend some time checking gev_gaus_rnd_effects_cens_loop.initial_point() and see if you canโt make some headway about why this fails: