Wrong calculation/chart on DPMM example?

I noticed the chart pictured below from the DPMM example doesn’t seem to be showing “normal” components correctly.

image

In the previous iteration of the example, the syntax was different for defining the model, but it produced what I believe is the correct chart/components (pictured below)

It’s clear to see that the “components” in above chart aren’t normally distributed, but bimodal. The below chart has correct normal components

image

Am I misunderstanding something about the output of DPMMs? Shouldn’t the components reflect the type of the base distribution?

If I’m correct, and there’s a problem with the exmaple, how can the correct components be obtained from the new DPMM syntax?

Seems there are some label switching in the new figure - we should rerun the notebook and check the inference more carefully.

1 Like

When I run the notebook NUTS doesn’t converge.
Some of the largest weights have gelman-rubin statistic of >3

Should I try to identify the label switching and manually fix it, or do you suspect there is a problem with the implementation in the new notebook causing this? I’m curious as to why the older syntax works correctly, or perhaps it’s the use of the metropolis and categorical steps in the older one.

Here’s a distplot of 2 of the ‘mu’ traces, showing bimodality:
image

It almost looks like sign-switching, but it’s not exactly symmetrical about 0

And these are the first 3 ‘w’:
image

It seems the label switching only occurs between chains, not within.
Sampling a single chain fixed the problem for me

image

Enforcing an “ordered” transform on “mu” also works, but I haven’t let it run enough samples yet, as it takes much longer.
mu = pm.Normal(‘mu’, 0, tau=lambda_ * tau, shape=K,
transform=pm.distributions.transforms.Ordered(), testval=np.linspace(-1, 1, K))

image

ADVI wasn’t working, even with the Ordered transform, etc.
Using the “SignFlipMeanFieldGroup” from the “Unique Solution for bPCA” thread allowed ADVI to infer normal components (see code below)

image

from pymc3.variational import approximations
from pymc3.variational.opvi import Group, node_property
import theano.tensor as tt
import theano

    @Group.register
    class SignFlipMeanFieldGroup(approximations.MeanFieldGroup):
        __param_spec__ = dict(smu=('d', ), rho=('d', ))
        short_name = 'signflip_mean_field'
        alias_names = frozenset(['sfmf'])

        @node_property
        def mean(self):
            return self.params_dict['smu']

        def create_shared_params(self, start=None):

            if start is None:
                start = self.model.test_point
            else:
                start_ = start.copy()
                update_start_vals(start_, self.model.test_point, self.model)
                start = start_

            if self.batched:
                start = start[self.group[0].name][0]

            else:
                start = self.bij.map(start)

            rho = np.zeros((self.ddim,))

            if self.batched:
                start = np.tile(start, (self.bdim, 1))
                rho = np.tile(rho, (self.bdim, 1))

            return {'smu': theano.shared(
                        pm.floatX(start), 'smu'),
                    'rho': theano.shared(
                        pm.floatX(rho), 'rho')}

        @node_property
        def symbolic_logq_not_scaled(self):
            z = self.symbolic_random

            logq = - tt.log(2.) + tt.log(
                             tt.exp(pm.Normal.dist(self.mean, self.std).logp(z)) + \
                             tt.exp(pm.Normal.dist(-self.mean, self.std).logp(z)))
        
            return logq.sum(range(1, z.ndim))

I used the reduce_rate learning_rate schedule from the LDA AEVB example which got ADVI to converge much faster to the results obtained from NUTS.

with model:
    symmetric_group = pm.Group([mu], vfam='sfmf')
    other_group = pm.Group(None, vfam='mf')
    approx = pm.Approximation([symmetric_group, other_group])
    inference = pm.KLqp(approx)

η = .001
s = theano.shared(η)
def reduce_rate(a, h, i):
    val=η/((i/len(old_faithful_df))+1)**.7
    s.set_value(val)
    

inference.fit(20000, obj_optimizer=pm.sgd(learning_rate=s), callbacks=[reduce_rate])

image

Yeah you need to kinda lie to yourself and sample only one mode… And there are efforts to improve the sampling of that notebook but we never really got it fix… https://github.com/pymc-devs/pymc3/pull/2956