Constraining order in a Mixture model

I am trying to create a mixture model with hyper parameters. Eventually, mu for the two parts will be computed using different equations but for the moment they are the same, while I resolve my problem with label switching. My model is defined as:

with pm.Model() as model:
    coef = pm.Normal('coeff', 6, 3, shape=2)
    power = pm.HalfCauchy('exp1', 1, shape=2)
    std = pm.HalfCauchy('std', 2)
    
    # Prevent label switching with the ordered transform
    norm1 = pm.Normal.dist(
        mu=(coef[0] + power[0] * selected['ln_depth']), sigma=std
    )
    norm2 = pm.Normal.dist(
        mu=(coef[1] + power[1] * selected['ln_depth']), sigma=std
    )
    
    w = pm.Dirichlet('w', a=np.array([1, 1]))

    like = pm.Mixture('like', w=w, comp_dists = [norm1, norm2], observed=selected['ln_vel'])

with model:
    simple = pm.find_MAP()
    idata = pm.sample(start=simple, return_inferencedata=True)

However, this results in the chains jumping.

I looked into using pm.Potential and pm.transform.ordered, but I don’t see that they can resolve this issue. Any suggestions?

The problem comes down to non-identifiability, i.e. multiple sets of parameters yield the same outcome and the likelihood is indifferent across several modes. The two ways of fixing this that I know are:

  1. Choose smarter priors that constrain your model such that label switching goes away.
  2. Add additional structure to your model that makes non-identifiability go away.

Personally, I like the second option better. @AustinRochford recently wrote a blog post on how to do this that I really like, which you can find here. However, he looks at factor analysis so I’m not sure how this translates to your case. There is also some more advice on mixture models in this thread.

I don’t immediately see how pm.Potential or pm.transform.ordered would be of any use here.

1 Like

Thanks for the links. Regarding using pm.Potential or pm.transform.ordered, I was hoping to use them to force coef[0] to be less than coef[1].

You want to add transform=pm.transforms.ordered to the constructor for coef. See the examples in Chapter 11 of the PyMC3 port of Statistical Rethinking.

1 Like

Thanks, transform=pm.transforms.ordered in the constructor works, but for anyone else that uses this also values for testval are also needed when you provide this constraint.

One more related question, what about adding constraints on what is provided pm.Normal.dist()? This question really stems from not fully understanding what is provided by the dist() method. I wanted to do something like this:

# Here depth is the predictor variable
dist1 = pm.Normal(mu=(c1[0] + c2[0] * depth), sigma=std).dist()
dist2 = pm.Normal(mu=(c1[1] + c2[1] * depth), sigma=std).dist()
# Want dist2 to always been greater than dist1. This doesn't work
ordered = pm.Potential('ordered', tt.switch(dists1 - dists2 < 0, -np.inf, 0))

I am essentially try to perform both a classification and regression simultaneously.

1 Like

What is the error you’re getting? I can’t construct dist1 and dist2 that way (I am on 4.0). Can you also provide more context about exactly what you’re trying to achieve here? This, however, works for me.

When I try to repeat your example, it also throws an error, but I am on version 3.11.


I will look into trying this with pymc3 4.0.

This might be too specific, but hopefully it helps explain what I am attempting to do. I am looking at developing a model for velocity profiles dependent on material type. The velocity is typically modeled as: V = c_1 \cdot z ^ {c_2}, where z is depth. Unfortunately, the material classification is poor at this site so I am trying to break the data into two (or three) materials and each material should have a different c_1 and c_2. In the pycm3 model, I am using a marginalized mixture model with the mean that varies with depth and computed from c_1 and c_2. I would like to constrain the mean such that one model is always greater than another model.

In playing with the model construction, I noticed that the NUTS sampler doesn’t seem to work well with a pm.Potential(...). However, if I change to step=pm.Metropolis() then the sampling works. I didn’t see any documentation that would would have lead me to this change.

Thanks for your help.