Yes, but I was actually thinking of adding a boolean property, impute_values
to the Model
object on creation, and produce an error rather than a warning if there are missing data and impute_values
is False
.
One thing I don’t understand is that it didn’t seem like my data got imputed, but I may be wrong. In the presence of missing data, my colleague ran the model and sent me this error message:
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
process_plan: r1bbktv6x4xke
Processing: plan_attributes.json
Found 93 Records in r1bbktv6x4xke
Processing r1bbktv6x4xke 7.5e-05 UWBF_NOR
Trace directory is: gander-data/UWBF_NOR/trace
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [output (1, 1)/5_missing, output (1, 1)/4_missing, output (1, 1)/3_missing, output (1, 1)/2_missing, output (1, 1)/1_missing, output (1, 1)/0_missing, output (0, 0)/5_missing, output (0, 0)/4_missing, output (0, 0)/3_missing, output (0, 0)/2_missing, output (0, 0)/1_missing, output (0, 0)/0_missing, output (1, 0)/5_missing, output (1, 0)/4_missing, output (1, 0)/3_missing, output (1, 0)/2_missing, output (1, 0)/1_missing, output (1, 0)/0_missing, output (0, 1)/5_missing, output (0, 1)/4_missing, output (0, 1)/3_missing, output (0, 1)/2_missing, output (0, 1)/1_missing, output (0, 1)/0_missing, sigma_(1, 1), sigma_(0, 0), sigma_(1, 0), sigma_(0, 1), mu_(1, 1) offset, mu_(0, 0) offset, mu_(1, 0) offset, mu_(0, 1) offset, hyper_sigma_sigma_(1, 1), hyper_sigma_sigma_(0, 0), hyper_sigma_sigma_(1, 0), hyper_sigma_sigma_(0, 1), hyper_sigma_mu_(1, 1), hyper_sigma_mu_(0, 0), hyper_sigma_mu_(1, 0), hyper_sigma_mu_(0, 1), hyper_mu_sigma_(1, 1), hyper_mu_sigma_(0, 0), hyper_mu_sigma_(1, 0), hyper_mu_sigma_(0, 1), hyper_mu_mu_(1, 1), hyper_mu_mu_(0, 0), hyper_mu_mu_(1, 0), hyper_mu_mu_(0, 1)]
Sampling 4 chains: 0%| | 0/4000 [00:00<?, ?draws/s]
pymc3.parallel_sampling.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 73, in run
self._start_loop()
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 113, in _start_loop
point, stats = self._compute_point()
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 139, in _compute_point
point, stats = self._step_method.step(self._point)
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/step_methods/arraystep.py", line 247, in step
apoint, stats = self.astep(array)
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/step_methods/hmc/base_hmc.py", line 117, in astep
'might be misspecified.' % start.energy)
ValueError: Bad initial energy: inf. The model might be misspecified.
"""
The above exception was the direct cause of the following exception:
ValueError: Bad initial energy: inf. The model might be misspecified.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home1/05426/plotnick/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home1/05426/plotnick/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home1/05426/plotnick/xplan/xplan-to-autoprotocol-reactor/helpers/gander.py", line 16, in <module>
tp_main("UWBF_NOR", max_plans=1, od=7.5e-5, train=train)
File "/home1/05426/plotnick/xplan/xplan-to-autoprotocol-reactor/helpers/train_prior.py", line 337, in main
all_models.append(train(_gate, ygdata))
File "/home1/05426/plotnick/xplan/xplan-to-autoprotocol-reactor/helpers/gander.py", line 12, in train
model = make_model(gate, data)
File "/home1/05426/plotnick/xplan-experiment-analysis/ygmodel.py", line 580, in make_model
nuts_kwargs=nuts_kwargs)
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/sampling.py", line 449, in sample
trace = _mp_sample(**sample_args)
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/sampling.py", line 999, in _mp_sample
for draw in sampler:
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 305, in __iter__
draw = ProcessAdapter.recv_draw(self._active)
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 223, in recv_draw
six.raise_from(RuntimeError('Chain %s failed.' % proc.chain), old)
File "<string>", line 3, in raise_from
RuntimeError: Chain 1 failed.```
And when I used graphviz to plot the model, my observation nodes foo_missing
were displayed as parents of the observation node, rather than children, and showed as Missing Distribution
. So maybe there’s something addition one must do to cause the imputation to actually happen?
Here’s a snippet from the graphviz figure: