Multivariate hierarchical linear model with different hyperpriors



I’m trying to build a multivariate hierarchical linear model, but I’m having trouble finding resources on how to do so. I can find examples of hierarchical univariate models and non-hierarchical multivariate models, but not the two together.

Fundamentally, I’m trying to build the following linear model (one per category) with shared hyperpriors between categories:

y_est = a[category] + b[category]*X

where category is an element of [0, 1, 2, 3] and X is an input matrix of num_training_samples by num_pred (10) covariates.

I’ve adapted the scikit learn wrapped linear model implementation at to perform this regression.

In the code below, which isn’t the full code, just the part that constructs the model, the following variables mean:

num_pred - number of predictors or covariates (10)
num_cats - number of categories for which to compute different coefficients (4)
model_cats - vector representing the category for each training sample [0, 1, 2, 3, 0, 1, 2, 3 … etc]

model_input  = theano.shared(np.zeros([num_training_samples, num_pred]))
model_output = theano.shared(np.zeros(num_training_samples, dtype='float'))
model_cats   = theano.shared(np.zeros(num_training_samples, dtype='int'))

model = pm.Model()

with model:
    # Hyperpriors for group nodes
    mu_alpha    = pm.Normal('mu_alpha', mu=0, sd=100)
    mu_beta     = pm.Normal('mu_beta', mu=0,  sd=100)
    sigma_alpha = pm.HalfCauchy('sigma_alpha', 5)
    sigma_beta  = pm.HalfCauchy('sigma_beta',  5)

    # model error
    eps = pm.HalfCauchy('eps', 5)
    # alpha and betas
    alpha = pm.Normal('alpha', mu=mu_alpha, sd=sigma_alpha, shape=(num_cats,))
    betas = pm.Normal('beta',  mu=mu_beta,  sd=sigma_beta,  shape=(num_cats, num_pred))

    c = model_cats
    test_est = alpha[c] + T.sum(betas[c] * model_input, 1)
    y_est = pm.Normal('y_est', test_est, sd=eps, observed=model_output)

So I am getting one alpha per category and all 4 alphas are sharing the hyperprior.

I don’t believe my beta hyperpriors are quite right. Since each beta represents a different covariate, it seems like it’s a mistake to set the same hyperpriors to be used for all of them. I would like to for my beta hyperpriors to be shared between categories, but not between different covariates.

I’ve made the following change to the shape to accomplish this:

    mu_beta     = pm.Normal('mu_beta',  mu=0, sd=100, shape=(1, self.num_pred))
    sigma_beta  = pm.HalfCauchy('sigma_beta',  5, shape=(1, self.num_pred))

The model seems to be building and inference seems to work, but I am having some trouble getting it to converge.

Is my concept reasonable?
Is my modification to the code above the correct way to accomplish my goal?


Your model might be over-specified, which makes inference difficult - I suggest you to try more informative prior, maybe start with sd=10 and HalfNormal with sd=2.5


Thanks for the response! I’ll try running that over the weekend to see how the inference works.

In the meantime, I feel a bit shaky about what the appropriate shape of mu_beta and sigma_beta ought to be.

Does my intuition about making them length(covariates)-dimensional to generate a different prior for each covariate make sense or should I be just keeping it at a 1d prior for all the covariates in the model specification?