Label switching in Hidden Markov Models

Hi!

I implemented a Hidden Markov Model with gaussian emissions to get posterior samples from multiple sequences of observations and my chains seem to be converging, however I am having label switching problems. Below you can find the code and results.

  • Code
N_states = 8
coord = {'emissions': np.arange(0, N_states)}

with pm.Model(coords = coord) as model:
    
    observations = pm.Data('data', observations, mutable = True)

    Pt = pm.Dirichlet("p_transition", np.ones( (N_states, N_states) ), shape=(N_states, N_states))
    P0 = pm.Dirichlet("p_init", np.ones((N_states,)), shape=(N_states,))


    logp_initial_state = at.log(P0)
    logp_transition = at.log(Pt)

    mu = pm.Normal('mu', mu = [-30,-25, -20, -16, -12,-9,-6,-5],  sigma = [2]*5+[1]*3)
    sigma = pm.InverseGamma('sigma', alpha= 40, beta=80, dims='emissions')

    loglike = pm.Potential( "hmm_loglike", hmm_logp_value_grad_op( observations, mu, sigma, logp_initial_state, logp_transition) )

After some research, I found that label switching is a common problem in mixture models and is caused by symmetry in the likelihood of the model parameters, but it is still not very clear to me the reasoning behind this problem. Can anyone recommend some literature about this topic?

Why does this happen and what can I do to prevent this?

Thank you in advance.

This post seems to be a good solution, but, as it was said in that post, we would need to overwrite the data in the trace object to swap the sampled values for the switched dimensions. Is it possible to edit the data of an Arviz data structure?

I want to do something like:

idata2 = idata.copy()
idata2.sel(chain=[1]).posterior['mu'][0][:,0] = idata2.sel(chain=[1]).posterior['mu'][0][:,1]

But, by doing this, idata2 is unaltered.

Could you use an ordered transform on mu?

Thank you for your answer!

I cannot find in the documentation what that function does to my variables. Either way, I implemented it and the inferences were much worse.

I think that, in my case, the best way to handle this is by post processing the trace. I don’t have mid chain label switching, so I can just re-label my components.