Mixture models and breaking class symmetry

brad · August 1, 2017, 3:03am

Hi there,

I’m having an issue with a simple mixture model. The samples are coming out correctly but they’re a little annoying to use. I have a model for 3 factories that mint coins with different biases:

with mc.Model():
    rates = mc.Beta('rates', 2, 2, shape=3)
    factory_choice = mc.Dirichlet('factory_choice', np.array([1, 1, 1]))
    factory = mc.Categorical('factory', p=factory_choice, shape=100)
    flips = mc.Binomial('flips', n=10, p=rates[factory], observed=heads)

Which produces this trace:

The true rates are 0.1, 0.3, 0.8 so the samples look about right when subject to the appropriate squint. What’s annoying is that they oscillate between which class is assigned to which rate, ie [0.1, 0.3, 0.8] is exactly as likely as [0.8, 0.1, 0.3] etc.

Here’s my question: is there a way to enforce a constraint so that rates[0] < rates[1] < rates[2] or something similar?

So far, I tried adding a call to theano.tensor.sort to the rates declaration and its call as the Binomial variable parameter. I also tried tweaking the parameters of the Dirichlet prior to break the symmetry but the of course there’s still local extrema at each permutation of the correct rates and class switching still happens (albeit less frequently). I’d like to avoid baking in too much external bias anyway. I can also sort each sample afterwards but I imagine there may still be some bad behavior that this constraint could limit.

Thank you!
Brad

aseyboldt · August 1, 2017, 5:56am

You can restrict the rates to be ordered:

class Ordered(pymc3.distributions.transforms.ElemwiseTransform):
    name = "ordered"

    def forward(self, x):
        out = tt.zeros(x.shape)
        out = tt.inc_subtensor(out[0], x[0])
        out = tt.inc_subtensor(out[1:], tt.log(x[1:] - x[:-1]))
        return out

    def backward(self, y):
        out = tt.zeros(y.shape)
        out = tt.inc_subtensor(out[0], y[0])
        out = tt.inc_subtensor(out[1:], tt.exp(y[1:]))
        return tt.cumsum(out)

    def jacobian_det(self, y):
        return tt.sum(y[1:])

with pm.Model() as model:
    rates = pm.Beta('rates', 2, 2, shape=3, transform=Ordered())
    ...

junpenglao · August 1, 2017, 7:53am

@aseyboldt wow this is a very nice trick! I was doing

    # use sort to break multimodality
    rate_sort = pm.Deterministic('mixing', tt.sort(rate))

~~But it breaks the geometry.~~ [Edit] actually tt.sort also has the correct gradient.

Could you please add this to pymc3? It is very useful for working with mixture models!

brad · August 2, 2017, 12:52am

Thanks for your help! Unfortunately this immediately causes an error for me in the Beta definition:

ValueError: The index list is longer (size 1) than the number of dimensions 
of the tensor(namely 0). You are asking for a dimension of the tensor that 
does not exist! You might need to use dimshuffle to add extra dimension 
to your tensor.

The offending line is

out = tt.inc_subtensor(out[0], x[0])

where it seems x == 0.5, not sure where that comes from.

Anyway, I’m new to pymc and theano so it’s tough for me to figure this one out. I’ll start reading up on theano and see if anything sticks out.

Thanks for all your help!

aseyboldt · August 2, 2017, 8:09am

Ah, I posted old code that used to work in slightly different circumstances. Lesson for next time: Always execute code before posting…

This should work (tested on master):

class Ordered(pm.distributions.transforms.ElemwiseTransform):
    name = "ordered"

    def forward(self, x):
        out = tt.zeros(x.shape)
        out = tt.inc_subtensor(out[0], x[0])
        out = tt.inc_subtensor(out[1:], tt.log(x[1:] - x[:-1]))
        return out
    
    def forward_val(self, x, point=None):
        x, = pm.distributions.distribution.draw_values([x], point=point)
        return self.forward(x)

    def backward(self, y):
        out = tt.zeros(y.shape)
        out = tt.inc_subtensor(out[0], y[0])
        out = tt.inc_subtensor(out[1:], tt.exp(y[1:]))
        return tt.cumsum(out)

    def jacobian_det(self, y):
        return tt.sum(y[1:])


class Composed(pm.distributions.transforms.Transform):    
    def __init__(self, trafo1, trafo2):
        self._trafo1 = trafo1
        self._trafo2 = trafo2
        self.name = '_'.join([trafo1.name, trafo2.name])

    def forward(self, x):
        return self._trafo2.forward(self._trafo1.forward(x))
    
    def forward_val(self, x, point=None):
        return self.forward(x)

    def backward(self, y):
        return self._trafo1.backward(self._trafo2.backward(y))

    def jacobian_det(self, y):
        y2 = self._trafo2.backward(y)
        det1 = self._trafo1.jacobian_det(y2)
        det2 = self._trafo2.jacobian_det(y)
        return det1 + det2


with pm.Model() as model:
    trafo = Composed(pm.distributions.transforms.LogOdds(), Ordered())
    rates = pm.Beta('rates', 2, 2, shape=3, transform=trafo,
                    testval=[0.3, 0.4, 0.5])

brad · August 3, 2017, 12:01am

This is seriously amazing. Thank you so much!

aseyboldt · August 3, 2017, 9:36am

No problem, glad I could help.
If you’d like to improve convergence further, you could try to marginalize out the discrete parameters. Sadly, we don’t have a good guide about how to do that yet, but you could have a look at the stan documentation, they have a chapter about this (Chapter 15.2 in https://github.com/stan-dev/stan/releases/download/v2.16.0/stan-reference-2.16.0.pdf).

junpenglao · August 3, 2017, 9:41am

Or a special Gibbs-MH sampler… which we do not have currently as well…

aseyboldt · August 3, 2017, 9:42am

Or discrete hamiltonian, which we will hopefully get soon

alphamaximus · September 21, 2017, 3:41pm

@junpenglao, since tt.sort also has the correct gradient, is it recommend to it this way or @aseyboldt’s way? tt.sort seems a lot cleaner.

junpenglao · September 21, 2017, 4:19pm

~~tt.sort should give the correct answer.~~ You should go with @aseyboldt’s solution as the sorting does not produce the correct logp

junpenglao · October 12, 2018, 6:38am

3 posts were split to a new topic: Error using Ordered() in Mixture of Uniform Distributions

Topic		Replies	Views
Error using Ordered() in Mixture of Uniform Distributions Questions	14	1844	February 7, 2019
Constrain one variable to be greater than another Questions	14	3481	January 30, 2018
Mixture of bounded normals as a prior Questions	4	1349	April 27, 2018
Constraining order in a Mixture model Questions	6	1410	August 9, 2021
Complex ordering mixture models	7	237	April 25, 2024

Mixture models and breaking class symmetry

Related topics