Wrong calculation/chart on DPMM example?

dycontri · February 13, 2019, 3:39am

I noticed the chart pictured below from the DPMM example doesn’t seem to be showing “normal” components correctly.

In the previous iteration of the example, the syntax was different for defining the model, but it produced what I believe is the correct chart/components (pictured below)

It’s clear to see that the “components” in above chart aren’t normally distributed, but bimodal. The below chart has correct normal components

Am I misunderstanding something about the output of DPMMs? Shouldn’t the components reflect the type of the base distribution?

If I’m correct, and there’s a problem with the exmaple, how can the correct components be obtained from the new DPMM syntax?

junpenglao · February 13, 2019, 6:25am

Seems there are some label switching in the new figure - we should rerun the notebook and check the inference more carefully.

dycontri · February 13, 2019, 5:20pm

When I run the notebook NUTS doesn’t converge.
Some of the largest weights have gelman-rubin statistic of >3

Should I try to identify the label switching and manually fix it, or do you suspect there is a problem with the implementation in the new notebook causing this? I’m curious as to why the older syntax works correctly, or perhaps it’s the use of the metropolis and categorical steps in the older one.

Here’s a distplot of 2 of the ‘mu’ traces, showing bimodality:

It almost looks like sign-switching, but it’s not exactly symmetrical about 0

And these are the first 3 ‘w’:

dycontri · February 13, 2019, 5:57pm

It seems the label switching only occurs between chains, not within.
Sampling a single chain fixed the problem for me

dycontri · February 13, 2019, 6:19pm

Enforcing an “ordered” transform on “mu” also works, but I haven’t let it run enough samples yet, as it takes much longer.
mu = pm.Normal(‘mu’, 0, tau=lambda_ * tau, shape=K,
transform=pm.distributions.transforms.Ordered(), testval=np.linspace(-1, 1, K))

dycontri · February 13, 2019, 6:48pm

ADVI wasn’t working, even with the Ordered transform, etc.
Using the “SignFlipMeanFieldGroup” from the “Unique Solution for bPCA” thread allowed ADVI to infer normal components (see code below)

from pymc3.variational import approximations
from pymc3.variational.opvi import Group, node_property
import theano.tensor as tt
import theano

    @Group.register
    class SignFlipMeanFieldGroup(approximations.MeanFieldGroup):
        __param_spec__ = dict(smu=('d', ), rho=('d', ))
        short_name = 'signflip_mean_field'
        alias_names = frozenset(['sfmf'])

        @node_property
        def mean(self):
            return self.params_dict['smu']

        def create_shared_params(self, start=None):

            if start is None:
                start = self.model.test_point
            else:
                start_ = start.copy()
                update_start_vals(start_, self.model.test_point, self.model)
                start = start_

            if self.batched:
                start = start[self.group[0].name][0]

            else:
                start = self.bij.map(start)

            rho = np.zeros((self.ddim,))

            if self.batched:
                start = np.tile(start, (self.bdim, 1))
                rho = np.tile(rho, (self.bdim, 1))

            return {'smu': theano.shared(
                        pm.floatX(start), 'smu'),
                    'rho': theano.shared(
                        pm.floatX(rho), 'rho')}

        @node_property
        def symbolic_logq_not_scaled(self):
            z = self.symbolic_random

            logq = - tt.log(2.) + tt.log(
                             tt.exp(pm.Normal.dist(self.mean, self.std).logp(z)) + \
                             tt.exp(pm.Normal.dist(-self.mean, self.std).logp(z)))
        
            return logq.sum(range(1, z.ndim))

dycontri · February 13, 2019, 7:04pm

I used the reduce_rate learning_rate schedule from the LDA AEVB example which got ADVI to converge much faster to the results obtained from NUTS.

with model:
    symmetric_group = pm.Group([mu], vfam='sfmf')
    other_group = pm.Group(None, vfam='mf')
    approx = pm.Approximation([symmetric_group, other_group])
    inference = pm.KLqp(approx)

η = .001
s = theano.shared(η)
def reduce_rate(a, h, i):
    val=η/((i/len(old_faithful_df))+1)**.7
    s.set_value(val)
    

inference.fit(20000, obj_optimizer=pm.sgd(learning_rate=s), callbacks=[reduce_rate])

junpenglao · February 13, 2019, 10:33pm

Yeah you need to kinda lie to yourself and sample only one mode… And there are efforts to improve the sampling of that notebook but we never really got it fix… https://github.com/pymc-devs/pymc3/pull/2956

Topic		Replies	Views
Issues with multivariate GMM model and label switching v5	2	176	April 27, 2024
Multivariate model not behaving as expected bug	3	223	March 19, 2024
Did anybody replicate the ADVI examples in the "Automatic Variational Inference in Stan" paper? Questions	4	1774	April 9, 2024
Badly fitting DPMM Questions	1	418	February 14, 2019
Label switching for multicomponent model modeling , nutpie	3	133	October 13, 2024

Wrong calculation/chart on DPMM example?

Related topics