Can't get sample_prior_predictive to work with missing values

Hi all,

I have a regression model with a number of missing values, and I am trying to conduct prior predictive checks using pm.sample_prior_predictive However, every time I am getting “ValueError: array is not broadcastable to correct shape” (full error is below). I have tried specifying just a few variables using the var_names parameter, but I get the same error. I can sample from the posterior just fine. Can anyone help me with this or give a workaround? This is my first time using pymc3 or working with any Bayesian model. Thank you in advance!

Seems related: Can't seem to `sample_prior_predictive` on model with missing value imputation

ValueError: array is not broadcastable to correct shape
Apply node that caused the error: AdvancedIncSubtensor1{no_inplace,set}(TensorConstant{[1.00e+20 .. 9.00e+01]}, OOB Config_missing, TensorConstant{[ 0 1 3 .. 61 62 64]})
Toposort index: 10
Inputs types: [TensorType(float64, vector), TensorType(float64, vector), TensorType(int64, vector)]
Inputs shapes: [(72,), (72,), (14,)]
Inputs strides: [(8,), (8,), (8,)]
Inputs values: ['not shown', 'not shown', 'not shown']
Outputs clients: [[Elemwise{Composite{(i0 + (i1 * i2) + (i3 * i4) + (i5 * i6) + (i7 * i8) + (i9 * i10) + (i11 * i12) + (i13 * i14) + (i15 * i16) + (i17 * i18) + (i19 * i20) + (i21 * i22) + (maximum(maximum(maximum(maximum(maximum(maximum(maximum(maximum(maximum(maximum(i23, i24), i25), i26), i27), i28), i29), i30), i31), i32), i33) * i34))}}[(0, 24)](InplaceDimShuffle{x}.0, InplaceDimShuffle{x}.0, DockAfterRNI, Site_DC, InplaceDimShuffle{x}.0, Site_Edge, InplaceDimShuffle{x}.0, Site_RNG, InplaceDimShuffle{x}.0, Site_SuperNode, InplaceDimShuffle{x}.0, Sub Region_AMERWEST, InplaceDimShuffle{x}.0, Sub Region_APAC, InplaceDimShuffle{x}.0, Sub Region_EMEA, InplaceDimShuffle{x}.0, ClassType_Class B, InplaceDimShuffle{x}.0, ClassType_Class C, InplaceDimShuffle{x}.0, ClassType_Class E, InplaceDimShuffle{x}.0, TensorConstant{(1,) of 0.0}, OOB Config, MGFX Config, Azn Optical Config, Port Config for Optical, WAN Final Config, SWAN Final Config, WAN TER Final Config, WAN IER Final Config, Fabric Config, Optical Acceptance, InplaceDimShuffle{x}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3145, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3337, in run_ast_nodes
if (await self.run_code(code, result, async_=asy)):
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-20-572dafd8db9a>", line 23, in <module>
WOVars[feature] = pm.Normal(feature, mu=starting_mu, sigma=starting_sigma, observed=df[feature])
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/distributions/distribution.py", line 83, in __new__
return model.Var(name, dist, data, total_size, dims=dims)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/model.py", line 1112, in Var
var = ObservedRV(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/model.py", line 1737, in __init__
data = as_tensor(data, name, model, distribution)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/model.py", line 1683, in as_tensor
dataTensor = tt.set_subtensor(constant[data.mask.nonzero()], missing_values)

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    902             outputs =\
--> 903                 self.fn() if output_subset is None else\
    904                 self.fn(output_subset=output_subset)
ValueError: array is not broadcastable to correct shape

During handling of the above exception, another exception occurred:
ValueError                                Traceback (most recent call last)
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/distributions/distribution.py in _draw_value(param, point, givens, size)
    834                 try:
--> 835                     return dist_tmp.random(point=point, size=size)
    836                 except (ValueError, TypeError):
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/distributions/continuous.py in random(self, point, size)
    512         """
--> 513         mu, tau, _ = draw_values([self.mu, self.tau, self.sigma],
    514                                  point=point, size=size)
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/distributions/distribution.py in draw_values(params, point, size)
    694                                     )
--> 695                         value = _draw_value(param,
    696                                             point=point,
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/distributions/distribution.py in _draw_value(param, point, givens, size)
    875             func = _compile_theano_function(param, input_vars)
--> 876             output = func(*input_vals)
    877             return output
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/numpy/lib/function_base.py in __call__(self, *args, **kwargs)
   2107 
-> 2108         return self._vectorize_call(func=func, args=vargs)
   2109 
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/numpy/lib/function_base.py in _vectorize_call(self, func, args)
   2181         if self.signature is not None:
-> 2182             res = self._vectorize_call_with_signature(func, args)
   2183         elif not args:
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/numpy/lib/function_base.py in _vectorize_call_with_signature(self, func, args)
   2222         for index in np.ndindex(*broadcast_shape):
-> 2223             results = func(*(arg[index] for arg in args))
   2224 
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    913                     thunk = self.fn.thunks[self.fn.position_of_error]
--> 914                 gof.link.raise_with_op(
    915                     node=self.fn.nodes[self.fn.position_of_error],
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
    324         pass
--> 325     reraise(exc_type, exc_value, exc_trace)
    326 
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/six.py in reraise(tp, value, tb)
    717             if value.__traceback__ is not tb:
--> 718                 raise value.with_traceback(tb)
    719             raise value
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    902             outputs =\
--> 903                 self.fn() if output_subset is None else\
    904                 self.fn(output_subset=output_subset)
ValueError: array is not broadcastable to correct shape
Apply node that caused the error: AdvancedIncSubtensor1{no_inplace,set}(TensorConstant{[1.00e+20 .. 9.00e+01]}, OOB Config_missing, TensorConstant{[ 0  1  3 .. 61 62 64]})
Toposort index: 10
Inputs types: [TensorType(float64, vector), TensorType(float64, vector), TensorType(int64, vector)]
Inputs shapes: [(72,), (72,), (14,)]
Inputs strides: [(8,), (8,), (8,)]
Inputs values: ['not shown', 'not shown', 'not shown']
Outputs clients: [[Elemwise{Composite{(i0 + (i1 * i2) + (i3 * i4) + (i5 * i6) + (i7 * i8) + (i9 * i10) + (i11 * i12) + (i13 * i14) + (i15 * i16) + (i17 * i18) + (i19 * i20) + (i21 * i22) + (maximum(maximum(maximum(maximum(maximum(maximum(maximum(maximum(maximum(maximum(i23, i24), i25), i26), i27), i28), i29), i30), i31), i32), i33) * i34))}}[(0, 24)](InplaceDimShuffle{x}.0, InplaceDimShuffle{x}.0, DockAfterRNI, Site_DC, InplaceDimShuffle{x}.0, Site_Edge, InplaceDimShuffle{x}.0, Site_RNG, InplaceDimShuffle{x}.0, Site_SuperNode, InplaceDimShuffle{x}.0, Sub Region_AMERWEST, InplaceDimShuffle{x}.0, Sub Region_APAC, InplaceDimShuffle{x}.0, Sub Region_EMEA, InplaceDimShuffle{x}.0, ClassType_Class B, InplaceDimShuffle{x}.0, ClassType_Class C, InplaceDimShuffle{x}.0, ClassType_Class E, InplaceDimShuffle{x}.0, TensorConstant{(1,) of 0.0}, OOB Config, MGFX Config, Azn Optical Config, Port Config for Optical, WAN Final Config, SWAN Final Config, WAN TER Final Config, WAN IER Final Config, Fabric Config, Optical Acceptance, InplaceDimShuffle{x}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3145, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3337, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-20-572dafd8db9a>", line 23, in <module>
    WOVars[feature] = pm.Normal(feature, mu=starting_mu, sigma=starting_sigma, observed=df[feature])
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/distributions/distribution.py", line 83, in __new__
    return model.Var(name, dist, data, total_size, dims=dims)
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/model.py", line 1112, in Var
    var = ObservedRV(
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/model.py", line 1737, in __init__
    data = as_tensor(data, name, model, distribution)
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/model.py", line 1683, in as_tensor
    dataTensor = tt.set_subtensor(constant[data.mask.nonzero()], missing_values)

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

During handling of the above exception, another exception occurred:
ValueError                                Traceback (most recent call last)
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    902             outputs =\
--> 903                 self.fn() if output_subset is None else\
    904                 self.fn(output_subset=output_subset)
ValueError: array is not broadcastable to correct shape

During handling of the above exception, another exception occurred:
ValueError                                Traceback (most recent call last)
<ipython-input-46-e140f9e8d68c> in <module>
      3 # # az.concat(trace, az.from_pymc3(posterior_predictive=post_pred), inplace=True)
      4 with model:
----> 5     prior = pm.sample_prior_predictive()
      6 az.plot_ppc(az.from_pymc3(prior=prior, model=model))
      7 
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/sampling.py in sample_prior_predictive(samples, model, vars, var_names, random_seed)
   1955     names = get_default_varnames(vars_, include_transformed=False)
   1956     # draw_values fails with auto-transformed variables. transform them later!
-> 1957     values = draw_values([model[name] for name in names], size=samples)
   1958 
   1959     data = {k: v for k, v in zip(names, values)}
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/distributions/distribution.py in draw_values(params, point, size)
    649                     # This may fail for autotransformed RVs, which don't
    650                     # have the random method
--> 651                     value = _draw_value(next_,
    652                                         point=point,
    653                                         givens=temp_givens,
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/distributions/distribution.py in _draw_value(param, point, givens, size)
    841                     # we don't want to store these drawn values to the context
    842                     with _DrawValuesContextBlocker():
--> 843                         val = np.atleast_1d(dist_tmp.random(point=point,
    844                                                             size=None))
    845                     # Sometimes point may change the size of val but not the
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/distributions/continuous.py in random(self, point, size)
    511         array
    512         """
--> 513         mu, tau, _ = draw_values([self.mu, self.tau, self.sigma],
    514                                  point=point, size=size)
    515         return generate_samples(stats.norm.rvs, loc=mu, scale=tau**-0.5,
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/distributions/distribution.py in draw_values(params, point, size)
    693                                         drawn[(node, size)]
    694                                     )
--> 695                         value = _draw_value(param,
    696                                             point=point,
    697                                             givens=givens.values(),
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/distributions/distribution.py in _draw_value(param, point, givens, size)
    874                 input_vals = []
    875             func = _compile_theano_function(param, input_vars)
--> 876             output = func(*input_vals)
    877             return output
    878     raise ValueError('Unexpected type in draw_value: %s' % type(param))
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/numpy/lib/function_base.py in __call__(self, *args, **kwargs)
   2106             vargs.extend([kwargs[_n] for _n in names])
   2107 
-> 2108         return self._vectorize_call(func=func, args=vargs)
   2109 
   2110     def _get_ufunc_and_otypes(self, func, args):
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/numpy/lib/function_base.py in _vectorize_call(self, func, args)
   2180         """Vectorized call to `func` over positional `args`."""
   2181         if self.signature is not None:
-> 2182             res = self._vectorize_call_with_signature(func, args)
   2183         elif not args:
   2184             res = func()
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/numpy/lib/function_base.py in _vectorize_call_with_signature(self, func, args)
   2221 
   2222         for index in np.ndindex(*broadcast_shape):
-> 2223             results = func(*(arg[index] for arg in args))
   2224 
   2225             n_results = len(results) if isinstance(results, tuple) else 1
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    912                 if hasattr(self.fn, 'thunks'):
    913                     thunk = self.fn.thunks[self.fn.position_of_error]
--> 914                 gof.link.raise_with_op(
    915                     node=self.fn.nodes[self.fn.position_of_error],
    916                     thunk=thunk,
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
    323         # extra long error message in that case.
    324         pass
--> 325     reraise(exc_type, exc_value, exc_trace)
    326 
    327 
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/six.py in reraise(tp, value, tb)
    716                 value = tp()
    717             if value.__traceback__ is not tb:
--> 718                 raise value.with_traceback(tb)
    719             raise value
    720         finally:
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    901         try:
    902             outputs =\
--> 903                 self.fn() if output_subset is None else\
    904                 self.fn(output_subset=output_subset)
    905         except Exception:
ValueError: array is not broadcastable to correct shape
Apply node that caused the error: AdvancedIncSubtensor1{no_inplace,set}(TensorConstant{[1.00e+20 .. 9.00e+01]}, OOB Config_missing, TensorConstant{[ 0  1  3 .. 61 62 64]})
Toposort index: 10
Inputs types: [TensorType(float64, vector), TensorType(float64, vector), TensorType(int64, vector)]
Inputs shapes: [(72,), (72,), (14,)]
Inputs strides: [(8,), (8,), (8,)]
Inputs values: ['not shown', 'not shown', 'not shown']
Outputs clients: [[Elemwise{Composite{(i0 + (i1 * i2) + (i3 * i4) + (i5 * i6) + (i7 * i8) + (i9 * i10) + (i11 * i12) + (i13 * i14) + (i15 * i16) + (i17 * i18) + (i19 * i20) + (i21 * i22) + (maximum(maximum(maximum(maximum(maximum(maximum(maximum(maximum(maximum(maximum(i23, i24), i25), i26), i27), i28), i29), i30), i31), i32), i33) * i34))}}[(0, 24)](InplaceDimShuffle{x}.0, InplaceDimShuffle{x}.0, DockAfterRNI, Site_DC, InplaceDimShuffle{x}.0, Site_Edge, InplaceDimShuffle{x}.0, Site_RNG, InplaceDimShuffle{x}.0, Site_SuperNode, InplaceDimShuffle{x}.0, Sub Region_AMERWEST, InplaceDimShuffle{x}.0, Sub Region_APAC, InplaceDimShuffle{x}.0, Sub Region_EMEA, InplaceDimShuffle{x}.0, ClassType_Class B, InplaceDimShuffle{x}.0, ClassType_Class C, InplaceDimShuffle{x}.0, ClassType_Class E, InplaceDimShuffle{x}.0, TensorConstant{(1,) of 0.0}, OOB Config, MGFX Config, Azn Optical Config, Port Config for Optical, WAN Final Config, SWAN Final Config, WAN TER Final Config, WAN IER Final Config, Fabric Config, Optical Acceptance, InplaceDimShuffle{x}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3145, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3337, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-20-572dafd8db9a>", line 23, in <module>
    WOVars[feature] = pm.Normal(feature, mu=starting_mu, sigma=starting_sigma, observed=df[feature])
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/distributions/distribution.py", line 83, in __new__
    return model.Var(name, dist, data, total_size, dims=dims)
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/model.py", line 1112, in Var
    var = ObservedRV(
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/model.py", line 1737, in __init__
    data = as_tensor(data, name, model, distribution)
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymc3/model.py", line 1683, in as_tensor
    dataTensor = tt.set_subtensor(constant[data.mask.nonzero()], missing_values)

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
1 Like

Yep, that sounds the same as what I found a little while ago:

As per @ricardoV94 's note, the issue seems avoidable by using shape rather than dims

2 Likes

Hi Jon, thanks for the reply. However, in my model I use neither shape nor dims. Do you know of any workaround? For now I am just checking the parameters of each prior manually, then sampling them with numpy and comparing to my data, but this could get very tedious if I make the model more complex.

Hmmm, are you able to share the model code? Might make it easier to grok and suggest a solution.

In the meantime, have you tried explicitly stating the shape?

Cheers, Jon

Thanks for the reply @jonsedar. Here is a stripped down version of my code. It works fine with the pm.sample_prior_predictive line commented out. df[“OOB Config”] is my variable with missing values. I have tried specifying “shape = 1” but it just gives a slightly different error.

test = pm.Model()
with test:
    OOB = pm.Normal("OOB", mu=df["OOB Config"].mean(), sigma=10.0, observed=df["OOB Config"])
    noise = pm.Normal("eps", mu=0, sigma=1.0)
    y = pm.Deterministic("y", OOB + noise)
    trace = pm.sample(10)
    prior = pm.sample_prior_predictive()

Best,

-Brendan