Thank you for your reply.
Some comments on your points:
- Using
pm.Potential
lead to several divergences, as you predicted. Moreover, the hierarchical setting has a slope vector of size n_groups
. If I were to use this soultion, would I reject the entire slope vector if one instance is negative? This would be equivalent to using: pm.Potential("truncate", tt.switch(tt.all(slope< 0.), -np.inf, 0.))
- I can try this idea. However, I would also need to validate how close is the sampled distribution to the truncated bivariate Gaussian and how sensitive is the difference to model changes.
Just because the current problem regards only two correlated parameters, I have reached the following solution:
with pm.Model() as model_hbm:
cov_chol_packed = pm.LKJCholeskyCov('cov_chol_packed', n=2, eta=1, sd_dist=pm.Exponential.dist(1.0))
cov_chol = pm.expand_packed_triangular(n=2, packed=cov_chol_packed)
cov = cov_chol.dot(cov_chol.T)
sd = tt.sqrt(tt.diag(cov))
corr = pm.Deterministic('corr', cov / sd[:, None] / sd[None, :])
corr_chol = tt.slinalg.cholesky(corr)
rho = pm.Deterministic('rho', corr[0, 1])
inter_mu = pm.Normal('inter_mu', mu=0, sd=10)
inter_sd = pm.HalfNormal('inter_sd', sd=10)
slope_mu = pm.HalfNormal('slope_mu', sd=10)
slope_sd = pm.HalfNormal('slope_sd', sd=10)
offset_1 = pm.Normal('offset_1', mu=0, sd=1, shape=n_sites)
offset_2 = pm.TruncatedNormal('offset_2', mu=0, sd=1, lower=-(slope_mu + slope_sd * rho * offset_1)/(tt.sqrt(1-rho**2) * slope_sd), shape=n_sites)
inter = pm.Deterministic('inter', inter_mu + inter_sd * offset_1)
slope = pm.Deterministic('slope', slope_mu + slope_sd * (rho * offset_1 + tt.sqrt(1 - rho ** 2) * offset_2))
....
Essentially, the model pre-truncates the offset sample of the slopes according to the samples of the slope population μ and σ, the intercept-slope correlation coefficient and the offset sample of the intercept. This pre-truncation of the slope offset sample is translated to a truncation at zero for the slope sample.
The model gives seemingly reasonable results with a low number of divergences. The divergences disappear as soon as I remove the LKJ sampling and use:
rho = pm.Uniform('rho', lower=-1, upper=+1)
I would love to hear your thoughts on the above.