`pm.MvNormal` gives `You cannot drop a non-broadcastable dimension` error when one-dimensional and `chol` is stochastic

My code was giving the error ValueError: ('You cannot drop a non-broadcastable dimension.', ((False, False), [])), and I reduced it to the following minimal example:

import pymc3 as pm
import numpy as np

D = 1 # no error with `D = 2` instead

with pm.Model() as model:
    sd = pm.HalfCauchy("sd", beta=0.1) # no error with `sd = 0.5` instead (if you add some random variable to avoid the "The model does not contain any free variables." error)
    pm.MvNormal("ob", mu=np.zeros((D)), chol=sd*np.eye(D), observed=np.zeros((D)))
    idata = pm.sample(2000, tune=1500, return_inferencedata=True)

Note the code runs fine if you either set a dimension D which is greater than 1, or if you set sd to some float rather than a random variable.

Is this a bug with pm.MvNormal, or does something weird happen with dimensions when you multiply a 1x1 matrix and a random variable? Or am I just doing something stupid…

Full log is here:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-ee5ade6fe7a9> in <module>
      7     sd = pm.HalfCauchy("sd", beta=0.1)
      8     pm.MvNormal("ob", mu=np.zeros((D)), chol=sd*np.eye(D), observed=np.zeros((D)))
----> 9     idata = pm.sample(2000, tune=1500, return_inferencedata=True)

~/.local/lib/python3.8/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, return_inferencedata, idata_kwargs, mp_ctx, pickle_backend, **kwargs)
    479             # By default, try to use NUTS
    480             _log.info("Auto-assigning NUTS sampler...")
--> 481             start_, step = init_nuts(
    482                 init=init,
    483                 chains=chains,

~/.local/lib/python3.8/site-packages/pymc3/sampling.py in init_nuts(init, chains, n_init, model, random_seed, progressbar, **kwargs)
   2168         raise ValueError("Unknown initializer: {}.".format(init))
   2169 
-> 2170     step = pm.NUTS(potential=potential, model=model, **kwargs)
   2171 
   2172     return start, step

~/.local/lib/python3.8/site-packages/pymc3/step_methods/hmc/nuts.py in __init__(self, vars, max_treedepth, early_max_treedepth, **kwargs)
    166         `pm.sample` to the desired number of tuning steps.
    167         """
--> 168         super().__init__(vars, **kwargs)
    169 
    170         self.max_treedepth = max_treedepth

~/.local/lib/python3.8/site-packages/pymc3/step_methods/hmc/base_hmc.py in __init__(self, vars, scaling, step_scale, is_cov, model, blocked, potential, dtype, Emax, target_accept, gamma, k, t0, adapt_step_size, step_rand, **theano_kwargs)
     91         vars = inputvars(vars)
     92 
---> 93         super().__init__(vars, blocked=blocked, model=model, dtype=dtype, **theano_kwargs)
     94 
     95         self.adapt_step_size = adapt_step_size

~/.local/lib/python3.8/site-packages/pymc3/step_methods/arraystep.py in __init__(self, vars, model, blocked, dtype, **theano_kwargs)
    241         self.blocked = blocked
    242 
--> 243         func = model.logp_dlogp_function(
    244             vars, dtype=dtype, **theano_kwargs)
    245 

~/.local/lib/python3.8/site-packages/pymc3/model.py in logp_dlogp_function(self, grad_vars, **kwargs)
    933         varnames = [var.name for var in grad_vars]
    934         extra_vars = [var for var in self.free_RVs if var.name not in varnames]
--> 935         return ValueGradFunction(self.logpt, grad_vars, extra_vars, **kwargs)
    936 
    937     @property

~/.local/lib/python3.8/site-packages/pymc3/model.py in __init__(self, cost, grad_vars, extra_vars, dtype, casting, **kwargs)
    647         )
    648 
--> 649         grad = tt.grad(self._cost_joined, self._vars_joined)
    650         grad.name = "__grad"
    651 

~/.local/lib/python3.8/site-packages/theano/gradient.py in grad(cost, wrt, consider_constant, disconnected_inputs, add_names, known_grads, return_disconnected, null_gradients)
    602             assert g.type.dtype in tensor.float_dtypes
    603 
--> 604     rval = _populate_grad_dict(var_to_app_to_idx,
    605                                grad_dict, wrt, cost_name)
    606 

~/.local/lib/python3.8/site-packages/theano/gradient.py in _populate_grad_dict(var_to_app_to_idx, grad_dict, wrt, cost_name)
   1369         return grad_dict[var]
   1370 
-> 1371     rval = [access_grad_cache(elem) for elem in wrt]
   1372 
   1373     return rval

~/.local/lib/python3.8/site-packages/theano/gradient.py in <listcomp>(.0)
   1369         return grad_dict[var]
   1370 
-> 1371     rval = [access_grad_cache(elem) for elem in wrt]
   1372 
   1373     return rval

~/.local/lib/python3.8/site-packages/theano/gradient.py in access_grad_cache(var)
   1324                     for idx in node_to_idx[node]:
   1325 
-> 1326                         term = access_term_cache(node)[idx]
   1327 
   1328                         if not isinstance(term, gof.Variable):

~/.local/lib/python3.8/site-packages/theano/gradient.py in access_term_cache(node)
   1019             inputs = node.inputs
   1020 
-> 1021             output_grads = [access_grad_cache(var) for var in node.outputs]
   1022 
   1023             # list of bools indicating if each output is connected to the cost

~/.local/lib/python3.8/site-packages/theano/gradient.py in <listcomp>(.0)
   1019             inputs = node.inputs
   1020 
-> 1021             output_grads = [access_grad_cache(var) for var in node.outputs]
   1022 
   1023             # list of bools indicating if each output is connected to the cost

~/.local/lib/python3.8/site-packages/theano/gradient.py in access_grad_cache(var)
   1324                     for idx in node_to_idx[node]:
   1325 
-> 1326                         term = access_term_cache(node)[idx]
   1327 
   1328                         if not isinstance(term, gof.Variable):

~/.local/lib/python3.8/site-packages/theano/gradient.py in access_term_cache(node)
   1019             inputs = node.inputs
   1020 
-> 1021             output_grads = [access_grad_cache(var) for var in node.outputs]
   1022 
   1023             # list of bools indicating if each output is connected to the cost

~/.local/lib/python3.8/site-packages/theano/gradient.py in <listcomp>(.0)
   1019             inputs = node.inputs
   1020 
-> 1021             output_grads = [access_grad_cache(var) for var in node.outputs]
   1022 
   1023             # list of bools indicating if each output is connected to the cost

~/.local/lib/python3.8/site-packages/theano/gradient.py in access_grad_cache(var)
   1324                     for idx in node_to_idx[node]:
   1325 
-> 1326                         term = access_term_cache(node)[idx]
   1327 
   1328                         if not isinstance(term, gof.Variable):

~/.local/lib/python3.8/site-packages/theano/gradient.py in access_term_cache(node)
   1019             inputs = node.inputs
   1020 
-> 1021             output_grads = [access_grad_cache(var) for var in node.outputs]
   1022 
   1023             # list of bools indicating if each output is connected to the cost

~/.local/lib/python3.8/site-packages/theano/gradient.py in <listcomp>(.0)
   1019             inputs = node.inputs
   1020 
-> 1021             output_grads = [access_grad_cache(var) for var in node.outputs]
   1022 
   1023             # list of bools indicating if each output is connected to the cost

~/.local/lib/python3.8/site-packages/theano/gradient.py in access_grad_cache(var)
   1324                     for idx in node_to_idx[node]:
   1325 
-> 1326                         term = access_term_cache(node)[idx]
   1327 
   1328                         if not isinstance(term, gof.Variable):

~/.local/lib/python3.8/site-packages/theano/gradient.py in access_term_cache(node)
   1019             inputs = node.inputs
   1020 
-> 1021             output_grads = [access_grad_cache(var) for var in node.outputs]
   1022 
   1023             # list of bools indicating if each output is connected to the cost

~/.local/lib/python3.8/site-packages/theano/gradient.py in <listcomp>(.0)
   1019             inputs = node.inputs
   1020 
-> 1021             output_grads = [access_grad_cache(var) for var in node.outputs]
   1022 
   1023             # list of bools indicating if each output is connected to the cost

~/.local/lib/python3.8/site-packages/theano/gradient.py in access_grad_cache(var)
   1324                     for idx in node_to_idx[node]:
   1325 
-> 1326                         term = access_term_cache(node)[idx]
   1327 
   1328                         if not isinstance(term, gof.Variable):

~/.local/lib/python3.8/site-packages/theano/gradient.py in access_term_cache(node)
   1159                                 str(g_shape))
   1160 
-> 1161                 input_grads = node.op.L_op(inputs, node.outputs,
   1162                                            new_output_grads)
   1163 

~/.local/lib/python3.8/site-packages/theano/gof/op.py in L_op(self, inputs, outputs, output_grads)
    709 
    710     def L_op(self, inputs, outputs, output_grads):
--> 711         return self.grad(inputs, output_grads)
    712 
    713     def R_op(self, inputs, eval_points):

~/.local/lib/python3.8/site-packages/theano/tensor/elemwise.py in grad(self, inp, grads)
    303             return [inp[0].zeros_like(dtype=theano.config.floatX)]
    304         else:
--> 305             return [DimShuffle(gz.type.broadcastable, grad_order)(
    306                 Elemwise(scalar.identity)(gz))]
    307 

~/.local/lib/python3.8/site-packages/theano/tensor/elemwise.py in __init__(self, input_broadcastable, new_order, inplace)
    193                 else:
    194                     # we cannot drop non-broadcastable dimensions
--> 195                     raise ValueError(
    196                         "You cannot drop a non-broadcastable dimension.",
    197                         (input_broadcastable, new_order))

ValueError: ('You cannot drop a non-broadcastable dimension.', ((False, False), []))
1 Like

Playing around with this, I tried chol=np.zeros(1) and got ValueError: chol must be two dimensional. and the same if I used cov instead - which makes sense to me, because what does a single dimensional covariance matrix mean? sd*np.eye(D) is only giving a single value when D is 1 though. The ValueError: chol must be two dimensional should probably be thrown instead in your case - possibly a bug that it’s not, @pymc_devs_bot?

That said, I did get it to run (albeit with acceptance probability warnings) using sd = pm.HalfCauchy("sd", beta=0.1, shape=(1, 1)) - though I think that still just gives a single value, so probably I’m misunderstanding something.

10/10 for a perfect example of how to ask a question involving code, by the way.

2 Likes

Thanks! It’s interesting that shape=(1,1) works - and that at least gives a valid workaround for the problem.

I just ran across another workaround - replacing sd*np.eye(D) with (sd*np.eye(D))[:].

I’m pretty sure sd*np.eye(1) should be a 1x1 two-dimensional matrix (scalar*matrix=matrix), though I’m not sure how to debug whether it really is - certainly sd*np.eye(2) seems to be two-dimensional, and I don’t see why it should be any different.

1 Like

Hi, I’m not quite sure what your end goal is. Do you wish to model something that is univariate or multi-variate? If Univariate then you can use pm.Normal() rather than pm.MvNormal. if multi-variate then you can use pm.MvNormal() but set D>=2.

chol parameter in MvNormal should be the cholskey matrix which when multiplied by itself transposed gives the covariance matrix. You are inputting the covariance matrix so you should use cov parameter. (though in your code this doesn’t matter since you just have non-zero diagonals)

Also you are inputting an observed without any variance, I’m not sure how well PyMC3 will handle that. If you are just testing things out I suggest you use pseudo random numbers from np.random.

1 Like

Oh sorry, I probably should have mentioned - this isn’t my actual code; I ran into this bug and then reduced it down to the simplest program that still shows the same problem. Certainly I’m not worried about the fact that this model doesn’t sample well, I just don’t think it should give a theano error, and wanted to check if this was a bug or something I was doing wrong.

The reason I ended up using a 1-dimensional multivariate normal is that in my actual code I loop through several sets of observations, each of which corresponds to a different dimensional multivariate normal. So I don’t want to put in a special case for sets which only contain a single observation, when the same code ought to work fine.

Looks like there’s definitely some sort of bug here, so I’ve opened an issue.

2 Likes