Mixture models and breaking class symmetry

Hi there,

I’m having an issue with a simple mixture model. The samples are coming out correctly but they’re a little annoying to use. I have a model for 3 factories that mint coins with different biases:

with mc.Model():
    rates = mc.Beta('rates', 2, 2, shape=3)
    factory_choice = mc.Dirichlet('factory_choice', np.array([1, 1, 1]))
    factory = mc.Categorical('factory', p=factory_choice, shape=100)
    flips = mc.Binomial('flips', n=10, p=rates[factory], observed=heads)

Which produces this trace:

The true rates are 0.1, 0.3, 0.8 so the samples look about right when subject to the appropriate squint. What’s annoying is that they oscillate between which class is assigned to which rate, ie [0.1, 0.3, 0.8] is exactly as likely as [0.8, 0.1, 0.3] etc.

Here’s my question: is there a way to enforce a constraint so that rates[0] < rates[1] < rates[2] or something similar?

So far, I tried adding a call to theano.tensor.sort to the rates declaration and its call as the Binomial variable parameter. I also tried tweaking the parameters of the Dirichlet prior to break the symmetry but the of course there’s still local extrema at each permutation of the correct rates and class switching still happens (albeit less frequently). I’d like to avoid baking in too much external bias anyway. I can also sort each sample afterwards but I imagine there may still be some bad behavior that this constraint could limit.

Thank you!
Brad

1 Like

You can restrict the rates to be ordered:

class Ordered(pymc3.distributions.transforms.ElemwiseTransform):
    name = "ordered"

    def forward(self, x):
        out = tt.zeros(x.shape)
        out = tt.inc_subtensor(out[0], x[0])
        out = tt.inc_subtensor(out[1:], tt.log(x[1:] - x[:-1]))
        return out

    def backward(self, y):
        out = tt.zeros(y.shape)
        out = tt.inc_subtensor(out[0], y[0])
        out = tt.inc_subtensor(out[1:], tt.exp(y[1:]))
        return tt.cumsum(out)

    def jacobian_det(self, y):
        return tt.sum(y[1:])

with pm.Model() as model:
    rates = pm.Beta('rates', 2, 2, shape=3, transform=Ordered())
    ...
6 Likes

@aseyboldt wow this is a very nice trick! I was doing

    # use sort to break multimodality
    rate_sort = pm.Deterministic('mixing', tt.sort(rate))

But it breaks the geometry. [Edit] actually tt.sort also has the correct gradient.

Could you please add this to pymc3? It is very useful for working with mixture models!

2 Likes

Thanks for your help! Unfortunately this immediately causes an error for me in the Beta definition:

ValueError: The index list is longer (size 1) than the number of dimensions 
of the tensor(namely 0). You are asking for a dimension of the tensor that 
does not exist! You might need to use dimshuffle to add extra dimension 
to your tensor.

The offending line is

out = tt.inc_subtensor(out[0], x[0])

where it seems x == 0.5, not sure where that comes from.

Anyway, I’m new to pymc and theano so it’s tough for me to figure this one out. I’ll start reading up on theano and see if anything sticks out.

Thanks for all your help!

Ah, I posted old code that used to work in slightly different circumstances. Lesson for next time: Always execute code before posting…

This should work (tested on master):

class Ordered(pm.distributions.transforms.ElemwiseTransform):
    name = "ordered"

    def forward(self, x):
        out = tt.zeros(x.shape)
        out = tt.inc_subtensor(out[0], x[0])
        out = tt.inc_subtensor(out[1:], tt.log(x[1:] - x[:-1]))
        return out
    
    def forward_val(self, x, point=None):
        x, = pm.distributions.distribution.draw_values([x], point=point)
        return self.forward(x)

    def backward(self, y):
        out = tt.zeros(y.shape)
        out = tt.inc_subtensor(out[0], y[0])
        out = tt.inc_subtensor(out[1:], tt.exp(y[1:]))
        return tt.cumsum(out)

    def jacobian_det(self, y):
        return tt.sum(y[1:])


class Composed(pm.distributions.transforms.Transform):    
    def __init__(self, trafo1, trafo2):
        self._trafo1 = trafo1
        self._trafo2 = trafo2
        self.name = '_'.join([trafo1.name, trafo2.name])

    def forward(self, x):
        return self._trafo2.forward(self._trafo1.forward(x))
    
    def forward_val(self, x, point=None):
        return self.forward(x)

    def backward(self, y):
        return self._trafo1.backward(self._trafo2.backward(y))

    def jacobian_det(self, y):
        y2 = self._trafo2.backward(y)
        det1 = self._trafo1.jacobian_det(y2)
        det2 = self._trafo2.jacobian_det(y)
        return det1 + det2


with pm.Model() as model:
    trafo = Composed(pm.distributions.transforms.LogOdds(), Ordered())
    rates = pm.Beta('rates', 2, 2, shape=3, transform=trafo,
                    testval=[0.3, 0.4, 0.5])
3 Likes

This is seriously amazing. Thank you so much!

1 Like

No problem, glad I could help. :slight_smile:
If you’d like to improve convergence further, you could try to marginalize out the discrete parameters. Sadly, we don’t have a good guide about how to do that yet, but you could have a look at the stan documentation, they have a chapter about this (Chapter 15.2 in https://github.com/stan-dev/stan/releases/download/v2.16.0/stan-reference-2.16.0.pdf).

1 Like

Or a special Gibbs-MH sampler… which we do not have currently as well…

Or discrete hamiltonian, which we will hopefully get soon :wink:

1 Like

@junpenglao, since tt.sort also has the correct gradient, is it recommend to it this way or @aseyboldt’s way? tt.sort seems a lot cleaner.

tt.sort should give the correct answer. You should go with @aseyboldt’s solution as the sorting does not produce the correct logp

3 posts were split to a new topic: Error using Ordered() in Mixture of Uniform Distributions