Yes, but I was actually thinking of adding a boolean property, impute_values to the Model object on creation, and produce an error rather than a warning if there are missing data and impute_values is False.
One thing I don’t understand is that it didn’t seem like my data got imputed, but I may be wrong. In the presence of missing data, my colleague ran the model and sent me this error message:
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
process_plan: r1bbktv6x4xke
Processing: plan_attributes.json
Found 93 Records in r1bbktv6x4xke
Processing r1bbktv6x4xke 7.5e-05 UWBF_NOR
Trace directory is: gander-data/UWBF_NOR/trace
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [output (1, 1)/5_missing, output (1, 1)/4_missing, output (1, 1)/3_missing, output (1, 1)/2_missing, output (1, 1)/1_missing, output (1, 1)/0_missing, output (0, 0)/5_missing, output (0, 0)/4_missing, output (0, 0)/3_missing, output (0, 0)/2_missing, output (0, 0)/1_missing, output (0, 0)/0_missing, output (1, 0)/5_missing, output (1, 0)/4_missing, output (1, 0)/3_missing, output (1, 0)/2_missing, output (1, 0)/1_missing, output (1, 0)/0_missing, output (0, 1)/5_missing, output (0, 1)/4_missing, output (0, 1)/3_missing, output (0, 1)/2_missing, output (0, 1)/1_missing, output (0, 1)/0_missing, sigma_(1, 1), sigma_(0, 0), sigma_(1, 0), sigma_(0, 1), mu_(1, 1) offset, mu_(0, 0) offset, mu_(1, 0) offset, mu_(0, 1) offset, hyper_sigma_sigma_(1, 1), hyper_sigma_sigma_(0, 0), hyper_sigma_sigma_(1, 0), hyper_sigma_sigma_(0, 1), hyper_sigma_mu_(1, 1), hyper_sigma_mu_(0, 0), hyper_sigma_mu_(1, 0), hyper_sigma_mu_(0, 1), hyper_mu_sigma_(1, 1), hyper_mu_sigma_(0, 0), hyper_mu_sigma_(1, 0), hyper_mu_sigma_(0, 1), hyper_mu_mu_(1, 1), hyper_mu_mu_(0, 0), hyper_mu_mu_(1, 0), hyper_mu_mu_(0, 1)]
Sampling 4 chains: 0%| | 0/4000 [00:00<?, ?draws/s]
pymc3.parallel_sampling.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 73, in run
self._start_loop()
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 113, in _start_loop
point, stats = self._compute_point()
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 139, in _compute_point
point, stats = self._step_method.step(self._point)
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/step_methods/arraystep.py", line 247, in step
apoint, stats = self.astep(array)
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/step_methods/hmc/base_hmc.py", line 117, in astep
'might be misspecified.' % start.energy)
ValueError: Bad initial energy: inf. The model might be misspecified.
"""
The above exception was the direct cause of the following exception:
ValueError: Bad initial energy: inf. The model might be misspecified.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home1/05426/plotnick/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home1/05426/plotnick/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home1/05426/plotnick/xplan/xplan-to-autoprotocol-reactor/helpers/gander.py", line 16, in <module>
tp_main("UWBF_NOR", max_plans=1, od=7.5e-5, train=train)
File "/home1/05426/plotnick/xplan/xplan-to-autoprotocol-reactor/helpers/train_prior.py", line 337, in main
all_models.append(train(_gate, ygdata))
File "/home1/05426/plotnick/xplan/xplan-to-autoprotocol-reactor/helpers/gander.py", line 12, in train
model = make_model(gate, data)
File "/home1/05426/plotnick/xplan-experiment-analysis/ygmodel.py", line 580, in make_model
nuts_kwargs=nuts_kwargs)
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/sampling.py", line 449, in sample
trace = _mp_sample(**sample_args)
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/sampling.py", line 999, in _mp_sample
for draw in sampler:
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 305, in __iter__
draw = ProcessAdapter.recv_draw(self._active)
File "/home1/05426/plotnick/lib/python3.6/site-packages/pymc3/parallel_sampling.py", line 223, in recv_draw
six.raise_from(RuntimeError('Chain %s failed.' % proc.chain), old)
File "<string>", line 3, in raise_from
RuntimeError: Chain 1 failed.```
And when I used graphviz to plot the model, my observation nodes foo_missing were displayed as parents of the observation node, rather than children, and showed as Missing Distribution. So maybe there’s something addition one must do to cause the imputation to actually happen?
Here’s a snippet from the graphviz figure: