Constrain one variable to be greater than another

drbenvincent · January 28, 2018, 11:57am

I have a model where there is this mixture-esque deterministic equation

(p * exp(-k1*d)) + ((1-p) * exp(-k2*d))

where p, k1, and k2 are stochastic variables and d is data. The samples being returned are seemingly sensible, it’s just there’s no chain convergence. For example, samples of chain 1 of k1 seem to be similar to samples of chain 2 of k2, where what you really want are chain 1 and chain 2 samples of k1 to be the same. Hope that makes sense?

I think the solution is to constrain k1<k2. Is this possible in PyMC3?

Or is it simpler to re-parameterise? Perhaps replacing k2 with k1+b where b is positive, to give

(p * exp(-k1*d)) + ((1-p) * exp(-(k1+b)*d))

junpenglao · January 28, 2018, 1:02pm

Yes, you are seeing label switching in your trace. It happens a lot with mixture models.
One of the must-reads related to this topic is Michael Betancourt’s Stan case study Identifying Bayesian Mixture Models. I have a Gist here replicating all the experiment in PyMC3.

As an experiment, you can also switch the trace by hand. We have a discussion here in this post, with code here.

Something related is if you are sampling an Ordered and bounded vector (e.g., x1 < x2 ~ Uniform), current implementation seems to be biased (Sampling uniformly in a triangular support). But for continous RV it should be fine.

drbenvincent · January 28, 2018, 1:05pm

Awesome. Have implemented reparameterization and it seems to be working just fine

drbenvincent · January 28, 2018, 8:53pm

Lol, it was working fine on the basis of some posterior prediction. But then I saw this

The original question is still answered

Just looks like it’s not going to be simple to make this model work

drbenvincent · January 29, 2018, 11:17am

Note to future self
Yes, you can use a transformation (as pointed out by @junpenglao) with example here. Initially it looks a bit ARGH, but it’s pretty simple:

initially I had two variables k1 and k2 and wanted them to be ordered.
solution 1 was to reparameterize so that k2=k1+b where b>0. This seemed to work judging by some posterior predictive plots, but the chains were in fact all over the place.
so solution 2 was to use the ordered transform in the link above. It’s pretty simple… just add the Ordered class, then add something like , transform=Ordered(), testval=[0.1, 0.2] into the definition of the PyMC3 stochastic variable

More details at #2 on the repo

sammosummo · January 29, 2018, 4:25pm

This is very interesting

It’s clear to me why solution #1 produces OK predictions but poor parameter estimates. Essentially there is too much prior density on low values of b (perhaps the prior was a half-normal or something). When b is close to 0, the value of k1 needs to be close to what k2 should be.

Could someone provide an intuitive explanation for why solution #2 works better?

drbenvincent · January 29, 2018, 5:53pm

Yes, that makes sense in fact. Yep, it was a HalfNormal on b, and looking at the traces, if I had the sigma higher then it could have worked and perhaps masked the more fundamental problem?

sammosummo · January 29, 2018, 7:01pm

If you ran it with a uniform prior on b, say b ~ N (0, 100), I imagine you would get better mixing. I would like to understand intuitively why solution #2 works, but I don’t currently.

drbenvincent · January 29, 2018, 7:14pm

Likewise. Can anyone else help us with the intuition?

junpenglao · January 29, 2018, 7:42pm

IMO the proper explanation is in Betancourt’s case study Identifying Bayesian Mixture Models
Especially these two paragraphs:

This identification also has a welcome geometric interpretation. The region of parameter space satisfying a given ordering constraint, such as α_1≤ … ≤ α_K, defines a square pyramid with the apex point at zero. The K-dimensional parameter space neatly decomposes into K! of these pyramids, each with a distinct ordering and hence association with a unique labeling.

When the priors are exchangeable the mixture posterior aliases across each of these pyramids in parameter space: if we were given the mixture posterior restricted to one of these pyramids then we could reconstruct the entire mixture distribution by simply rotating that restricted distribution into each of the other K!−1 pyramids. As we do this we also map the mode in the restricted distribution into each pyramid, creating exactly the expected K! multimodality. Moreover, those rotations are exactly given by permuting the parameter indices and reordering the corresponding parameter values.

You can loosely think of it as cutting out some of the regions in your model parameter space so the sampler dont wonders around and becomes more efficient.

sammosummo · January 29, 2018, 8:40pm

Thanks!

So is this generally true of any situation where you have two or more free parameters that must be ordered in a certain way?

junpenglao · January 29, 2018, 9:01pm

Yes. In some occasion where you have strong information (in your prior or likelihood), you might be able to get away with not using a ordered vector. But in general it helps with model convergent.

drbenvincent · January 29, 2018, 9:06pm

Bler. I think I’m going to need to sketch that out in order to follow that explanation. Unless someone wants to write a blog post with animated explanatory diagrams

sammosummo · January 29, 2018, 9:40pm

@drbenvincent did you code up both versions of the model? It might be illuminating to see side-by-side comparisons of them applied to the same data.

drbenvincent · January 30, 2018, 7:52pm

No, I just updated and moved on. My primary goal at the moment is to get things working, but maybe this is something I can come back to in order to get a bit more insight

Topic		Replies	Views
Constraining order in a Mixture model Questions	6	1382	August 9, 2021
Constrain the sum of two categorical variables Questions modeling	3	771	January 24, 2022
Label switching for multicomponent model modeling , nutpie	3	120	October 13, 2024
Constrain likelihood based on covariates v5 prior , modeling	3	48	January 9, 2025
Hierarchical model where convergence is difficult: chains look bad. Any tips? Questions	6	1018	January 10, 2019

Constrain one variable to be greater than another

Related topics