Truncated multivariate prior in hierarchical setting

Hi all

Put shortly, my problem is about using a truncated multivariate distribution as a prior in a hierarchical setting.

More specifically, I am working on a hierarchical linear model and the intercepts and slopes are described by a bivariate distribution. As a clarification, I am applying a non-centered model, as described here: https://twiecki.io/blog/2017/02/08/bayesian-hierchical-non-centered/. On top of that, the slopes need to be truncated at zero.

I tried to implement a solution as described here: Truncated/Bounded Multivariate Normal?. Expectedly, this didn’t work in a hierarchical model.

Do you have any suggestions?

Thank you in advance.

Couple things you can try:

  1. Adding a potential in the model pm.Potential("truncate", tt.switch(beta < 0., -np.inf, 0.)), it makes the logp goes -inf so any samples with beta < 0 will never be accepted. But downside is that it introduce bound in the parameter space and sampling could be difficult (you get a lot of divergent if the posterior of beta is close to 0.
  2. Apply a transformation on beta, for example a softplus (it is better than sigmoid because it is mostly linear). The downside is that beta does not follow exactly a multivariate Gaussian prior.

Thank you for your reply.

Some comments on your points:

  1. Using pm.Potential lead to several divergences, as you predicted. Moreover, the hierarchical setting has a slope vector of size n_groups. If I were to use this soultion, would I reject the entire slope vector if one instance is negative? This would be equivalent to using: pm.Potential("truncate", tt.switch(tt.all(slope< 0.), -np.inf, 0.))
  2. I can try this idea. However, I would also need to validate how close is the sampled distribution to the truncated bivariate Gaussian and how sensitive is the difference to model changes.

Just because the current problem regards only two correlated parameters, I have reached the following solution:


with pm.Model() as model_hbm:

    cov_chol_packed = pm.LKJCholeskyCov('cov_chol_packed', n=2, eta=1, sd_dist=pm.Exponential.dist(1.0))
    cov_chol = pm.expand_packed_triangular(n=2, packed=cov_chol_packed)
    cov = cov_chol.dot(cov_chol.T)
    sd = tt.sqrt(tt.diag(cov))
    corr = pm.Deterministic('corr', cov / sd[:, None] / sd[None, :])
    corr_chol = tt.slinalg.cholesky(corr)
    rho = pm.Deterministic('rho', corr[0, 1])

    inter_mu = pm.Normal('inter_mu', mu=0, sd=10)
    inter_sd = pm.HalfNormal('inter_sd', sd=10)

    slope_mu = pm.HalfNormal('slope_mu', sd=10)
    slope_sd = pm.HalfNormal('slope_sd', sd=10)

    offset_1 = pm.Normal('offset_1', mu=0, sd=1, shape=n_sites)
    offset_2 = pm.TruncatedNormal('offset_2', mu=0, sd=1, lower=-(slope_mu + slope_sd * rho * offset_1)/(tt.sqrt(1-rho**2) * slope_sd), shape=n_sites)

    inter = pm.Deterministic('inter', inter_mu + inter_sd * offset_1)
    slope = pm.Deterministic('slope', slope_mu + slope_sd * (rho * offset_1 + tt.sqrt(1 - rho ** 2) * offset_2))

....

Essentially, the model pre-truncates the offset sample of the slopes according to the samples of the slope population μ and σ, the intercept-slope correlation coefficient and the offset sample of the intercept. This pre-truncation of the slope offset sample is translated to a truncation at zero for the slope sample.

The model gives seemingly reasonable results with a low number of divergences. The divergences disappear as soon as I remove the LKJ sampling and use:

rho = pm.Uniform('rho', lower=-1, upper=+1)

I would love to hear your thoughts on the above.

Yes (HMC accept or reject all variables at the same time unlike Gibbs)

The model looks pretty reasonable to me. If LKJ is causing issue, you can place more informative prior (higher eta) as well.

1 Like