Chain Failure when running model

I am trying to use PyMC3 to regress the thermal conductivity of low alloy steels. I have set up the model as such:


with pm.Model() as FeCCrMn_T_model:
# we also pass the database
mypriors = {‘Intercept’: pm.Normal.dist(mu=60, sd=100)}
pm.glm.GLM.from_formula('Conductivity ~ T+C+Cr+Mn+TC+TCr+TMn’, FeCCrMn_T, priors=mypriors) #TCCrMn

#do the sampling with 2000 iterations and 4 separate chains.
trace = pm.sample(2000, chains=2,)
prior = pm.sample_prior_predictive()
posterior_predictive = pm.sample_posterior_predictive(trace)

#save everything in a ArviZ inference object for ease of analysis
fit_CCrMnT = az.from_pymc3(trace=trace, prior=prior, posterior_predictive=posterior_predictive)

When I try to run this however, I keep running into the following error:


Auto-assigning NUTS sampler…
Initializing NUTS using jitter+adapt_diag…
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sd, T:Mn, T:Cr, T:C, Mn, Cr, C, T, Intercept]
Sampling 2 chains: 14%|█▍ | 715/5000 [00:06<00:41, 104.49draws/s]

RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
“”"
Traceback (most recent call last):
File “C:\Users\Karan Mohindra\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py”, line 110, in run
self._start_loop()
File “C:\Users\Karan Mohindra\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py”, line 160, in _start_loop
point, stats = self._compute_point()
File “C:\Users\Karan Mohindra\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py”, line 191, in _compute_point
point, stats = self._step_method.step(self._point)
File “C:\Users\Karan Mohindra\Anaconda3\lib\site-packages\pymc3\step_methods\arraystep.py”, line 247, in step
apoint, stats = self.astep(array)
File “C:\Users\Karan Mohindra\Anaconda3\lib\site-packages\pymc3\step_methods\hmc\base_hmc.py”, line 130, in astep
self.potential.raise_ok(self._logp_dlogp_func._ordering.vmap)
File “C:\Users\Karan Mohindra\Anaconda3\lib\site-packages\pymc3\step_methods\hmc\quadpotential.py”, line 231, in raise_ok
raise ValueError(’\n’.join(errmsg))
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV Intercept.ravel()[0] is zero.
“”"

The above exception was the direct cause of the following exception:

ValueError Traceback (most recent call last)
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV Intercept.ravel()[0] is zero.

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)
in
6
7 #do the sampling with 2000 iterations and 4 separate chains.
----> 8 trace = pm.sample(2000, chains=2,)
9 prior = pm.sample_prior_predictive()
10 posterior_predictive = pm.sample_posterior_predictive(trace)

~\Anaconda3\lib\site-packages\pymc3\sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, **kwargs)
435 _print_step_hierarchy(step)
436 try:
–> 437 trace = _mp_sample(**sample_args)
438 except pickle.PickleError:
439 _log.warning(“Could not pickle model, sampling singlethreaded.”)

~\Anaconda3\lib\site-packages\pymc3\sampling.py in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, **kwargs)
967 try:
968 with sampler:
–> 969 for draw in sampler:
970 trace = traces[draw.chain - chain]
971 if (trace.supports_sampler_stats

~\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py in iter(self)
391
392 while self._active:
–> 393 draw = ProcessAdapter.recv_draw(self._active)
394 proc, is_last, draw, tuning, stats, warns = draw
395 if self._progress is not None:

~\Anaconda3\lib\site-packages\pymc3\parallel_sampling.py in recv_draw(processes, timeout)
295 else:
296 error = RuntimeError(“Chain %s failed.” % proc.chain)
–> 297 raise error from old_error
298 elif msg[0] == “writing_done”:
299 proc._readable = True

RuntimeError: Chain 1 failed.


What do you think I can change here?

Hi,
I’m not familiar with the GLM module, but I think your prior is misspecified – see this tutorial.
Plus, I don’t know your data, but I think your prior’s sd is really big, which can mess up sampling.
Hope this helps :vulcan_salute: