Rejecting/thinning modes based on average density

madanh · January 31, 2018, 8:49am

If I have two well separated modes, they are essentially exhaustive mutually exclusive. In this case, at least naively, I can approximate the mode probability masses by averaging the density over the sample.

Is it a legitimate technique to reject chains stuck in a less probable mode based on relative mass approximated as above e.g. if the mass ratio is larger than the sample size?

Further is it legitimate to thin out the chains based on mode mass ratio approximated as above?

junpenglao · January 31, 2018, 9:01am

I dont think that is the appropriate way to do. If you have different chains completely stuck in different modes, how can you be sure the density is estimated correctly? The estimation would be biased to begin with.

madanh · January 31, 2018, 9:28am

Just to make sure that we are on the same page: by density I mean exp(logp()) at sample points.

junpenglao · January 31, 2018, 10:26am

So you are trying to scale the marginal posterior (e.g., smoothed histogram) by the exp(logp())? Besides some simple cases, I dont think this approach is sounded.
Just take a simple example:

with pm.Model():
    pm.NormalMixture('m', 
        mu=np.array([0., 5.]), 
        w=np.array([.5, .5]), 
        sd=np.array([1., 1.]))
    trace = pm.sample(1000)
pm.traceplot(trace);

As an example, take the yellow chain and the green chain, the logp(x) at each point would be the same even though the shape marginal density is different. And the situation would just get worse in higher dimension.

madanh · February 1, 2018, 12:53pm

You are right.
Also just checked my maths - what I proposed is not the right way to do it.

madanh · February 1, 2018, 12:56pm

But it seems that I can get what I want if I can estimate the volume of mode support, he-he-he…

junpenglao · February 1, 2018, 1:17pm

I guess depends on what you plan to do. If you only care about minimizing prediction error, focusing on a single posterior mode does not necessary gives you bad performance (think of MLE or Laplace approximation, you essentially ignore all other smaller mode and use what hopefully the global maximum).

madanh · February 1, 2018, 2:06pm

First and foremost I want to compare mode masses. I have chains sampling in regions around two different solutions: do they have comparable probability or one of them is just a small bump compared to the other and we can safely ignore it?

BTW I now use ‘advi+adapt_diag’ for init. What do you think: can disabling the adapt_diag part help switching between modes?

junpenglao · February 1, 2018, 2:15pm

Maybe you can try SMC, it might gives a better weighting with lots of chains.

advi tend to underestimate the variance, so if anything you should try with the default jitter+adapt_diag. Ideally you want to have large enough energy proposal to mix across different mode. It might be interesting to try adapt_diag_grad as initialization.

github.com

pymc-devs/pymc/blob/9e9539ec5064640a127ef4045ccefd3d16164047/pymc3/sampling.py#L1276-L1292


      
          elif init == 'advi+adapt_diag_grad':
              approx = pm.fit(
                  random_seed=random_seed,
                  n=n_init, method='advi', model=model,
                  callbacks=cb,
                  progressbar=progressbar,
                  obj_optimizer=pm.adagrad_window,
              )  # type: pm.MeanField
              start = approx.sample(draws=chains)
              start = list(start)
              stds = approx.bij.rmap(approx.std.eval())
              cov = model.dict_to_array(stds) ** 2
              mean = approx.bij.rmap(approx.mean.get_value())
              mean = model.dict_to_array(mean)
              weight = 50
              potential = quadpotential.QuadPotentialDiagAdaptGrad(
                  model.ndim, mean, cov, weight)

Topic		Replies	Views
How to average multiple chains? v5	16	1032	April 20, 2023
SMC chains with multiple modes v3 smc	2	702	March 10, 2022
Chains converge to local optima? version agnostic gaussian_process , modeling	6	69	October 11, 2024
Question about DensityDist, again Questions	5	1000	October 23, 2019
DensityDist in PyMC4 PyMC4	2	981	February 12, 2020

Rejecting/thinning modes based on average density

Related topics