Truncated/Bounded Multivariate Normal?


#1

I ran into a question on StackOverflow where someone wanted to impose a strong multivariate Gaussian prior on the coefficients in a binomial regression model. I coded up the following PyMC3 model as a suggested solution for them:

import numpy as np
import pymc3 as pm
import theano
import theano.tensor as tt
import matplotlib.pyplot as plt
import arviz as az

# Data
X = np.array([-0.86, -0.30, -0.05, 0.73])
N = np.array([5, 5, 5, 5])
Y = np.array([0, 1, 3, 5])

# augment X for simpler regression expression
X_aug = tt.stack(np.ones_like(X), X).T

# Prior params
mu = np.array([0, 10])
sd = np.array([22, 102])
Rho = np.array([[1, 0.5],[0.5, 1]])
Sigma = np.outer(sd, sd) * Rho

with pm.Model() as binomial_regression:
    # regression coefficients (strong prior)
    beta = pm.MvNormal('beta', mu=mu, cov=Sigma, shape=2)

    # death probability
    theta_i = pm.Deterministic('theta_i', pm.invlogit(X_aug.dot(beta)))

    # outcomes
    y_i = pm.Binomial('y_i', n=N, p=theta_i, observed=Y)

    trace = pm.sample(10000, tune=100000, target_accept=0.8, random_seed=2018)

However, there were many divergences without a high number of tuning steps, and even then I still got one. I don’t find the location of the divergence to be particularly insightful:

image

I do, however, find the posterior to be insightful about the inappropriateness of the prior, namely that the slope coefficient should really never be negative:

So, I thought a better solution that would still allow for the modeler’s strong prior would be to use a truncated Gaussian that restricts the second coefficient to be nonnegative. However, I’m not sure how to do this, hence the question:

How does one make a truncated/bounded multivariate normal in PyMC3?

Bonus Points: If anyone has thoughts about the divergence issue, those are also welcome. I’m still pretty new to HMC/NUTS, so my experience with reparameterizing models is rather weak.


#2

Maybe the easiest is to put a potential in the model, something like pm.Potential("bound", tt.switch(beta[0]>0, 0, -np.inf) which makes the sampler rejected any beta0 smaller than 0.


#3

Out of curiosity, how would I specify multiple bounds? Let’s say I have a bivariate distribution, and I want both beta[0] and beta[1] to be non-negative.
The following (admittedly naive) approach does not work:

pm.Potential("bound", tt.switch(beta[0] > 0 and beta[1] > 0, 0, -np.inf)

Thanks a lot in advance!


#4

How about: pm.Potential("bound", tt.switch((beta[0] > 0)*(beta[1] > 0), 0, -np.inf))?


#5

Thanks a lot. In principle, this works.

However, I realized some undesired effect the introduction of a potential has on the posterior distribution. Even if I specify a trivial potential

muBounded = pm.Potential("muBounded", tt.switch(1, 0, 0))

the sampled posterior distribution changes. Is this expected, and if so, why?

Here’s the model specification with potential:

with pm.Model() as update_model:
    # non-centered parametrization
    mean_error = pm.Normal('mean_error', 0, 1, shape=dimension)
    mu = pm.Deterministic('mu', mu_marginal_mean + tt.dot(mu_cholesky, mean_error))
    muBounded = pm.Potential("muBounded", tt.switch(1, 0, 0))

    L_packed_error = pm.Normal('L_packed_error', 0, 1, shape=packedMatrixDimension[dimension])
    L_packed = pm.Deterministic('L_packed', L_packed_marginal_mean + tt.dot(L_packed_cholesky, L_packed_error))

    obs = pm.MvNormal('obs', muBounded, chol=pm.expand_packed_triangular(dimension, L_packed), observed=observations)

This is the original model without potential:

with pm.Model() as update_model:
    # non-centered parametrization
    mean_error = pm.Normal('mean_error', 0, 1, shape=dimension)
    mu = pm.Deterministic('mu', mu_marginal_mean + tt.dot(mu_cholesky, mean_error))

    L_packed_error = pm.Normal('L_packed_error', 0, 1, shape=packedMatrixDimension[dimension])
    L_packed = pm.Deterministic('L_packed', L_packed_marginal_mean + tt.dot(L_packed_cholesky, L_packed_error))

    obs = pm.MvNormal('obs', mu, chol=pm.expand_packed_triangular(dimension, L_packed), observed=observations)

As you can see from the posterior distribution shown below (see second plot), the sampled distribution probabilities no longer center around the “true” value, which are represented by black vertical lines. Interestingly, it only affects the variable that is not constrained directly (L_packed, and not mu to which I apply the bound).

The full code of the MWE, as well as the corresponding plot of the model without potential is available in this thread: Updating multivariate priors

I’d be very curious on any opinion on this!


#6

Likely there are other things interacting with your parameters - did you try to evaluate the logp to see if they are the same?


#7

Thanks for the hint!

It turns out I just did a basic mistake. If I introduce a trivial potential (note the second argument within switch(), which is now mu instead of 0!)

muBounded = pm.Potential("muBounded", tt.switch(1, mu, 0))

it all works as expected.