Enforcing monotonic constraints on effect size parameters

Hi all,

Is there any way to enforce monotonic constraints on parameters?

Here’s part of my model definition:

\begin{aligned} R_i &\sim \text{Normal}(\mu_i, \sigma_R)\\ \mu_i &= Z_{\text{GROUP}[i]}(\beta_1D_{1,i} + \beta_2D_{2,i} + \cdots + \beta_MD_{M,i})\\ \sigma_R &\sim \text{Exponential}(1)\\ \vec{\beta} &\sim \text{MVNormal}\left(\begin{pmatrix}0\\0\\\vdots\\0\end{pmatrix}, \textbf{K}\right)\\ \cdots \end{aligned}

As you can see, right now I’m sampling \vec{\beta}, the vector of effect sizes, from a multivariate normal. I use the covariance matrix to make sure that \beta_{m-1} \approx \beta_m \approx \beta_{m+1}.

BUT what I really need is \beta_{m-1} > \beta_{m} > \beta_{m+1}. Is there any for me to implement this “monotonic” constraint on the effect sizes?

Hi there,

I’m not 100% sure I understood what you’re looking for, but one pattern I’ve used before to enforce monotonic increases is to model:

\beta_1 \sim \mathcal{N}(\mu, \sigma^2)

and then to model the others as offsets from some positive distribution, e.g.:

\delta_i \stackrel{iid}{\sim} \text{Gamma}(\alpha, \theta)

, where i ranges from 1 to M-1 in your case. Then you can define \beta_2 and so on as \beta_1 plus the cumulative sums of the \delta parameters, so that \beta_2 = \beta_1 + \delta_1, \beta_3 = \beta_1 + \delta_1 + \delta_2 and so on. What do you think, would that work? And sorry, I’d hoped to give you a cleaner notation but hopefully you get the idea.

1 Like

Hey Martin,

Yes! I think you’ve understood my problem exactly. I think that should work, but I do have a question about this pattern. For large values of m, have you run into any issues from accumulating the variance from \delta_1, \delta_2, \ldots, \delta_m? I ask because I don’t necessarily want \text{var}(\beta_m) < \text{var}(\beta_{m+1}).

Keith

Hi Keith,

hmmm, I see your point; for me personally I don’t think that ended up being an issue, but it’s true that you’ll end up having higher variance for larger m. I guess my only thought would be that you could maybe choose your prior on the \delta to have pretty low variance if you expect the \beta to all be quite similar, but maybe someone has a better idea!

1 Like

Yeah—I may need to accept that large m produces more variance and try to minimize it.

This was a helpful conversation! Thank you! :slight_smile:

1 Like

Another common way to do it is to model the largest parameter \beta_m using whatever distribution (eg normal), and then use Dirichlet distribution to get “weights”, that one adds up cumulatively to get to i-th effect. It has a disadvantage that it doesn’t allow some parameters to be positive and some negative, though. brms does that in R world: Estimating Monotonic Effects with brms

3 Likes

You can translate and multiply to get results in any range. For example, if the range of values you want is (L, U), you can sample

\theta \sim \text{Dirichlet}(\alpha)

and set

\phi = (U - L) \cdot \theta + L.

You can look at the marginals in a Dirichlet, which are beta distributions, but they’re not uniform. There’s also correlation because of the sum-to-one constraint on the simplex \theta.

If you want to get more detailed here, you can parameterize on an unconstrained scale with an isometric log ratio (ILR) transform—I think @aseyboldt was discussing the ILR here, but I can’t find the post. It’s also under discussion for inclusion directly: Implement `Ordered` distribution factory · Issue #7297 · pymc-devs/pymc · GitHub

3 Likes