Problem with fitting Multivariate Mixture of Gaussians

Hi,

My overall aim is to create a Multivariate Gaussian Mixture model with Dirichlet weights prior in PyMC3. I am trying to run the notebook from Austin Rochford available here https://gist.github.com/AustinRochford/41109579c8be23a10e2cd167209dbe25 , but I am having problems getting the code to run without errors.

Here is the error I get:

AttributeError                            Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in _comp_logp(self, value)
    109 
--> 110             return comp_dists.logp(value_)
    111         except AttributeError:

AttributeError: 'list' object has no attribute 'logp'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-8-374c618ef5c0> in <module>()
      5     #w = pm.Deterministic('w', stick_breaking(beta))
      6     w = pm.Dirichlet('w', np.ones(K))
----> 7     obs = pm.Mixture('obs', w, [multivariatenormal(init_mean, init_sigma, init_corr, k, dist=True) for k in range(K)], observed=data)

/usr/local/lib/python3.6/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
     34                 raise TypeError("observed needs to be data but got: {}".format(type(data)))
     35             total_size = kwargs.pop('total_size', None)
---> 36             dist = cls.dist(*args, **kwargs)
     37             return model.Var(name, dist, data, total_size)
     38         else:

/usr/local/lib/python3.6/site-packages/pymc3/distributions/distribution.py in dist(cls, *args, **kwargs)
     45     def dist(cls, *args, **kwargs):
     46         dist = object.__new__(cls)
---> 47         dist.__init__(*args, **kwargs)
     48         return dist
     49 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in __init__(self, w, comp_dists, *args, **kwargs)
     91         try:
     92             comp_modes = self._comp_modes()
---> 93             comp_mode_logps = self.logp(comp_modes)
     94             self.mode = comp_modes[tt.argmax(w * comp_mode_logps, axis=-1)]
     95 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in logp(self, value)
    139         w = self.w
    140 
--> 141         return bound(logsumexp(tt.log(w) + self._comp_logp(value), axis=-1).sum(),
    142                      w >= 0, w <= 1, tt.allclose(w.sum(axis=-1), 1),
    143                      broadcast_conditions=False)

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in _comp_logp(self, value)
    110             return comp_dists.logp(value_)
    111         except AttributeError:
--> 112             return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists],
    113                             axis=1)
    114 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in <listcomp>(.0)
    110             return comp_dists.logp(value_)
    111         except AttributeError:
--> 112             return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists],
    113                             axis=1)
    114 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/multivariate.py in logp(self, value)
    268 
    269     def logp(self, value):
--> 270         quaddist, logdet, ok = self._quaddist(value)
    271         k = value.shape[-1].astype(theano.config.floatX)
    272         norm = - 0.5 * k * pm.floatX(np.log(2 * np.pi))

/usr/local/lib/python3.6/site-packages/pymc3/distributions/multivariate.py in _quaddist(self, value)
     87             onedim = False
     88 
---> 89         delta = value - mu
     90 
     91         if self._cov_type == 'cov':

/usr/local/lib/python3.6/site-packages/theano/tensor/var.py in __sub__(self, other)
    145         # and the return value in that case
    146         try:
--> 147             return theano.tensor.basic.sub(self, other)
    148         except (NotImplementedError, AsTensorError):
    149             return NotImplemented

/usr/local/lib/python3.6/site-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    672                 thunk.outputs = [storage_map[v] for v in node.outputs]
    673 
--> 674                 required = thunk()
    675                 assert not required  # We provided all inputs
    676 

/usr/local/lib/python3.6/site-packages/theano/gof/op.py in rval()
    860 
    861         def rval():
--> 862             thunk()
    863             for o in node.outputs:
    864                 compute_map[o][0] = True

/usr/local/lib/python3.6/site-packages/theano/gof/cc.py in __call__(self)
   1733                 print(self.error_storage, file=sys.stderr)
   1734                 raise
-> 1735             reraise(exc_type, exc_value, exc_trace)
   1736 
   1737 

/usr/local/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

ValueError: Input dimension mis-match. (input[0].shape[1] = 3, input[1].shape[1] = 2)

It seems that the way dimensions are handled has changed in the past year? It seems that now, it expects the inputted data to be the same dimension as the number of mixture components (!)

I have also tried explicitly stating the shape parameter for the mixture components and overall mixture, but get the same error:

def multivariatenormal(init_mean, init_sigma, init_corr, suffix="", dist=False):
    if not isinstance(suffix, str):
        suffix = str(suffix)
    D = len(init_sigma)
    
    sigma = pm.Lognormal('sigma' + suffix, np.zeros(D, dtype=np.float), np.ones(D), shape=D, testval=init_sigma)
    nu = pm.Uniform('nu' + suffix, 0, 5)
    C_triu = pm.LKJCorr('C_triu' + suffix, nu, D, testval=init_corr)
    cov = pm.Deterministic('cov' + suffix, make_cov_matrix(sigma, C_triu, module=tt))
    
    mu = pm.MvNormal('mu' + suffix, 0, cov, shape=2, testval=init_mean)

    return pm.MvNormal.dist(mu, cov) if dist else pm.MvNormal('mvn' + suffix, mu, cov, shape=data.shape)

def stick_breaking(beta):
    portion_remaining = tt.concatenate([[1], tt.extra_ops.cumprod(1 - beta)[:-1]])
    return beta * portion_remaining

K = 3
with pm.Model() as model:
    #alpha = pm.Gamma('alpha', 1., 1.)
    #beta = pm.Beta('beta', 1, alpha, shape=K)
    #w = pm.Deterministic('w', stick_breaking(beta))
    w = pm.Dirichlet('w', np.ones(K))
    obs = pm.Mixture('obs', w, [multivariatenormal(init_mean, init_sigma, init_corr, k, dist=True) for k in range(K)], observed=data, shape=data.shape)

And here I basically get the same error:

AttributeError                            Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in _comp_logp(self, value)
    109 
--> 110             return comp_dists.logp(value_)
    111         except AttributeError:

AttributeError: 'list' object has no attribute 'logp'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-24-9772e61f2786> in <module>()
      5     #w = pm.Deterministic('w', stick_breaking(beta))
      6     w = pm.Dirichlet('w', np.ones(K))
----> 7     obs = pm.Mixture('obs', w, [multivariatenormal(init_mean, init_sigma, init_corr, k, dist=True) for k in range(K)], observed=data, shape=data.shape)

/usr/local/lib/python3.6/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
     34                 raise TypeError("observed needs to be data but got: {}".format(type(data)))
     35             total_size = kwargs.pop('total_size', None)
---> 36             dist = cls.dist(*args, **kwargs)
     37             return model.Var(name, dist, data, total_size)
     38         else:

/usr/local/lib/python3.6/site-packages/pymc3/distributions/distribution.py in dist(cls, *args, **kwargs)
     45     def dist(cls, *args, **kwargs):
     46         dist = object.__new__(cls)
---> 47         dist.__init__(*args, **kwargs)
     48         return dist
     49 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in __init__(self, w, comp_dists, *args, **kwargs)
     91         try:
     92             comp_modes = self._comp_modes()
---> 93             comp_mode_logps = self.logp(comp_modes)
     94             self.mode = comp_modes[tt.argmax(w * comp_mode_logps, axis=-1)]
     95 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in logp(self, value)
    139         w = self.w
    140 
--> 141         return bound(logsumexp(tt.log(w) + self._comp_logp(value), axis=-1).sum(),
    142                      w >= 0, w <= 1, tt.allclose(w.sum(axis=-1), 1),
    143                      broadcast_conditions=False)

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in _comp_logp(self, value)
    110             return comp_dists.logp(value_)
    111         except AttributeError:
--> 112             return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists],
    113                             axis=1)
    114 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in <listcomp>(.0)
    110             return comp_dists.logp(value_)
    111         except AttributeError:
--> 112             return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists],
    113                             axis=1)
    114 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/multivariate.py in logp(self, value)
    268 
    269     def logp(self, value):
--> 270         quaddist, logdet, ok = self._quaddist(value)
    271         k = value.shape[-1].astype(theano.config.floatX)
    272         norm = - 0.5 * k * pm.floatX(np.log(2 * np.pi))

/usr/local/lib/python3.6/site-packages/pymc3/distributions/multivariate.py in _quaddist(self, value)
     87             onedim = False
     88 
---> 89         delta = value - mu
     90 
     91         if self._cov_type == 'cov':

/usr/local/lib/python3.6/site-packages/theano/tensor/var.py in __sub__(self, other)
    145         # and the return value in that case
    146         try:
--> 147             return theano.tensor.basic.sub(self, other)
    148         except (NotImplementedError, AsTensorError):
    149             return NotImplemented

/usr/local/lib/python3.6/site-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    672                 thunk.outputs = [storage_map[v] for v in node.outputs]
    673 
--> 674                 required = thunk()
    675                 assert not required  # We provided all inputs
    676 

/usr/local/lib/python3.6/site-packages/theano/gof/op.py in rval()
    860 
    861         def rval():
--> 862             thunk()
    863             for o in node.outputs:
    864                 compute_map[o][0] = True

/usr/local/lib/python3.6/site-packages/theano/gof/cc.py in __call__(self)
   1733                 print(self.error_storage, file=sys.stderr)
   1734                 raise
-> 1735             reraise(exc_type, exc_value, exc_trace)
   1736 
   1737 

/usr/local/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

ValueError: Input dimension mis-match. (input[0].shape[1] = 3, input[1].shape[1] = 2)

I was referred here from a comment on a Github Issue on PyMC https://github.com/pymc-devs/pymc3/issues/1811#issuecomment-364265697 . Any help would be greatly appreciated.

Environment details:
OS: MacOS High Sierra 10.13.2 (17C205)
Python: 3.6.3
theano: 1.0.1
pymc3: 3.3

Initially, I thought the problem is the function multivariatenormal
return pm.MvNormal.dist(mu, cov) if dist else pm.MvNormal('mvn' + suffix, mu, cov) where a shape should be explicitly specified. But running the code it turns out it is not the case. The problem is that the Mixture is trying to compute a mode by evaluating the logp, but did not catch the wrong shape in the mode. For now, wrap the mixture logp as a DensityDist or a Potential should allow you to continue with the sampling etc.

from pymc3.distributions.dist_math import bound
from pymc3.math import logsumexp

K = 3
with pm.Model() as model:
    #alpha = pm.Gamma('alpha', 1., 1.)
    #beta = pm.Beta('beta', 1, alpha, shape=K)
    #w = pm.Deterministic('w', stick_breaking(beta))
    w = pm.Dirichlet('w', np.ones(K))
    comp_dists = [multivariatenormal(init_mean, init_sigma, init_corr, k, dist=True) for k in range(K)]
    logpcomp = tt.stack([comp_dist.logp(theano.shared(data)) for comp_dist in comp_dists], axis=1)
    pm.Potential('logp', logsumexp(tt.log(w) + logpcomp, axis=-1).sum())

I will try to send a bugfix in the meantime.

Thank-you for your help. I think I am running into another bug where when I try to fit this model it stops at 0 iterations and I get a Problem Reporter dialog on my Mac saying “Python quit unexpectedly”…

I may need to investigate this other issue but hopefully the code you have provided will work for me. :slight_smile:

That’s a memory problem… turn njobs=1 should fix it :wink:

Okay, thank-you, I’ve tried it and it works! :grinning: I get different answers to the results in the example and some warnings about convergence, but I suppose that’s because of different random seeds.

I am using the code as discussed in this post. The bug is still present. Is this correct?

I can also only sample with Metropolis. When the Nuts sampler is used, I still get the same shape mismatch.

Traceback (most recent call last):
  File "/opt/miniconda3/lib/python3.6/site-packages/theano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
ValueError: Input dimension mis-match. (input[0].shape[0] = 2, input[1].shape[0] = 1)

I just tried with the code above and it runs fine - are you using the same data generation process as in the notebook from Austin Rochford? GMM Metropolis Possible Bug · GitHub

Also, here is a more recent example: https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/WIP/[WIP]%20Bayesian%20GMM.ipynb