I am building a family of models (for different logic gates, implemented in biological cells). I have 6 models for different gates (AND, OR, NOT, NAND, XOR, and XNOR). My models for all but XNOR work fine. However, when I try to train the XNOR models (train here is learning the actual continuous output response, starting from a prior that captures the intended behavior), I get a chain failure error that causes PyMC3 to error out. Here’s a backtrace, but I’m afraid I can’t translate the backtrace into guidance for figuring out what went wrong and how to fix it:

```
pymc3.parallel_sampling.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 73, in run
self._start_loop()
File "/home/ubuntu/.local/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 113, in _start_loop
point, stats = self._compute_point()
File "/home/ubuntu/.local/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 139, in _compute_point
point, stats = self._step_method.step(self._point)
File "/home/ubuntu/.local/lib/python3.6/site-packages/pymc3/step_methods/arraystep.py", line 247, in step
apoint, stats = self.astep(array)
File "/home/ubuntu/.local/lib/python3.6/site-packages/pymc3/step_methods/hmc/base_hmc.py", line 115, in astep
self.potential.raise_ok(self._logp_dlogp_func._ordering.vmap)
File "/home/ubuntu/.local/lib/python3.6/site-packages/pymc3/step_methods/hmc/quadpotential.py", line 201, in raise_ok
raise ValueError('\n'.join(errmsg))
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV `sigma_Input11_log__`.ravel()[0] is zero.
The derivative of RV `sigma_Input11_log__`.ravel()[1] is zero.
The derivative of RV `sigma_Input11_log__`.ravel()[2] is zero.
The derivative of RV `mu_Input11_lowerbound__`.ravel()[0] is zero.
The derivative of RV `mu_Input11_lowerbound__`.ravel()[1] is zero.
The derivative of RV `mu_Input11_lowerbound__`.ravel()[2] is zero.
The derivative of RV `sigma_Input10_log__`.ravel()[0] is zero.
The derivative of RV `sigma_Input10_log__`.ravel()[1] is zero.
The derivative of RV `sigma_Input10_log__`.ravel()[2] is zero.
The derivative of RV `mu_Input10_lowerbound__`.ravel()[0] is zero.
The derivative of RV `mu_Input10_lowerbound__`.ravel()[1] is zero.
The derivative of RV `mu_Input10_lowerbound__`.ravel()[2] is zero.
The derivative of RV `sigma_Input01_log__`.ravel()[0] is zero.
The derivative of RV `sigma_Input01_log__`.ravel()[1] is zero.
The derivative of RV `sigma_Input01_log__`.ravel()[2] is zero.
The derivative of RV `mu_Input01_lowerbound__`.ravel()[0] is zero.
The derivative of RV `mu_Input01_lowerbound__`.ravel()[1] is zero.
The derivative of RV `mu_Input01_lowerbound__`.ravel()[2] is zero.
The derivative of RV `sigma_Input00_log__`.ravel()[0] is zero.
The derivative of RV `sigma_Input00_log__`.ravel()[1] is zero.
The derivative of RV `sigma_Input00_log__`.ravel()[2] is zero.
The derivative of RV `mu_Input00_lowerbound__`.ravel()[0] is zero.
The derivative of RV `mu_Input00_lowerbound__`.ravel()[1] is zero.
The derivative of RV `mu_Input00_lowerbound__`.ravel()[2] is zero.
The derivative of RV `hyper_mu_mu_Input00_lowerbound__`.ravel()[0] is zero.
"""
The above exception was the direct cause of the following exception:
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV `sigma_Input11_log__`.ravel()[0] is zero.
The derivative of RV `sigma_Input11_log__`.ravel()[1] is zero.
The derivative of RV `sigma_Input11_log__`.ravel()[2] is zero.
The derivative of RV `mu_Input11_lowerbound__`.ravel()[0] is zero.
The derivative of RV `mu_Input11_lowerbound__`.ravel()[1] is zero.
The derivative of RV `mu_Input11_lowerbound__`.ravel()[2] is zero.
The derivative of RV `sigma_Input10_log__`.ravel()[0] is zero.
The derivative of RV `sigma_Input10_log__`.ravel()[1] is zero.
The derivative of RV `sigma_Input10_log__`.ravel()[2] is zero.
The derivative of RV `mu_Input10_lowerbound__`.ravel()[0] is zero.
The derivative of RV `mu_Input10_lowerbound__`.ravel()[1] is zero.
The derivative of RV `mu_Input10_lowerbound__`.ravel()[2] is zero.
The derivative of RV `sigma_Input01_log__`.ravel()[0] is zero.
The derivative of RV `sigma_Input01_log__`.ravel()[1] is zero.
The derivative of RV `sigma_Input01_log__`.ravel()[2] is zero.
The derivative of RV `mu_Input01_lowerbound__`.ravel()[0] is zero.
The derivative of RV `mu_Input01_lowerbound__`.ravel()[1] is zero.
The derivative of RV `mu_Input01_lowerbound__`.ravel()[2] is zero.
The derivative of RV `sigma_Input00_log__`.ravel()[0] is zero.
The derivative of RV `sigma_Input00_log__`.ravel()[1] is zero.
The derivative of RV `sigma_Input00_log__`.ravel()[2] is zero.
The derivative of RV `mu_Input00_lowerbound__`.ravel()[0] is zero.
The derivative of RV `mu_Input00_lowerbound__`.ravel()[1] is zero.
The derivative of RV `mu_Input00_lowerbound__`.ravel()[2] is zero.
The derivative of RV `hyper_mu_mu_Input00_lowerbound__`.ravel()[0] is zero.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "Three_Layer_Analysis.py", line 309, in <module>
main()
File "Three_Layer_Analysis.py", line 304, in main
(entropy, model) = do_main(gate, trace_name=trace_name)
File "Three_Layer_Analysis.py", line 281, in do_main
cores=cores, tune=1000)
File "/home/ubuntu/.local/lib/python3.6/site-packages/pymc3/sampling.py", line 440, in sample
trace = _mp_sample(**sample_args)
File "/home/ubuntu/.local/lib/python3.6/site-packages/pymc3/sampling.py", line 990, in _mp_sample
for draw in sampler:
File "/home/ubuntu/.local/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 305, in __iter__
draw = ProcessAdapter.recv_draw(self._active)
File "/home/ubuntu/.local/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 223, in recv_draw
six.raise_from(RuntimeError('Chain %s failed.' % proc.chain), old)
File "<string>", line 3, in raise_from
RuntimeError: Chain 3 failed.
```

One initial question – is it a fatal error that a single chain failed, or can this just happen every now and then? Should I be trying to work around by recovering from this error (perhaps by discarding the failing chain)? Or does this indicate some major failure in the parameterization of the model? You will see from the trace that a number of bounded variables are involved (normals that are constrained to be greater than zero).