Hi,
I would like to specify a hierarchical linear model where each coefficient is associated with one of two prior distributions. In particular, if you represent the N^2 coefficients as an NxN matrix, then the coefficients on the main diagonal should have as their prior some distribution D1 and the coefficients off the diagonal should have a different distribution D2.
I’ve attempted to implement this myself, but running it even on data produced by this process results in a sampling failure (i.e. the sampler does not converge). This makes me think that I’ve misspecified the model in my implementation — so my question is, what is the principled way to implement what I described above?
To be clear, the model should look something like:
y = b_{0, 0} x_{0, 0} + b_{0, 1} x_{0, 1} + … + b_{i, j} x_{i, j} + … b_{N, N} x_{N, N}
where b_{i, i} ~ D_1(\phi) and b_{i, j} ~ D_2(\theta)
and \phi and \theta are themselves inferred but with priors corresponding to known distributions.
Thanks.
My attempted implementation:
import numpy as np
import theano as tt
import pymc3 as pm
import matplotlib.pyplot as plt
import sys
n = 2
num_regressors = n ** 2
num_data_samples = 2000
# hyperprior
d0_mean_mu = 0
d0_mean_sd = 10
d1_mean_mu = -10
d1_mean_sd = 10
# sample true regressor vector
true_bi_mean = np.random.normal(d0_mean_mu, 0.1)
true_bij_mean = np.random.normal(d1_mean_mu, 0.1)
true_bi = np.random.normal(true_bi_mean, 0.1, size=n)
true_bij = np.random.normal(true_bij_mean, 0.1, size=num_regressors - n)
# generate data
X = np.random.randint(2, size=[num_data_samples, num_regressors])
regress = np.hstack((true_bi[0], true_bij[0 : n - 1]))
for k in range(1, n):
regress = np.hstack((regress, np.hstack((true_bi[k], true_bij[k*(n - 1) : (k*(n-1)) + n - 1]))))
y = np.zeros((num_data_samples, 1))
for k in range(0, num_data_samples):
y[k, 0] = np.dot(regress, X[k, :])
# update weights
with pm.Model() as hierarchical_model:
# hyperpriors for group nodes
mu_theta_s = pm.Normal('mu_theta_s', mu=d0_mean_mu, sd=d0_mean_sd)
mu_phi_m = pm.Normal('mu_phi_m', mu=d1_mean_mu, sd=d1_mean_sd)
# distributions for diagonal (b_i) and off-diagonal weights (b_ij)
b_i = pm.Normal('b_i', mu=mu_theta_s, sd=1, shape=(n, 1))
b_ij = pm.Normal('b_ij', mu=mu_phi_m, sd=1, shape=(num_regressors - n, 1))
# model error
eps = pm.HalfNormal('eps', 1)
# reward estimate
# the following 3 lines basically convert the two vectors of coefficients b_i and b_ij into a
# matrix form with the b_i on the diagonal and the b_ij filling in around
reg = pm.math.concatenate( [b_i[0:1], b_ij[0:n - 1]], axis=0)
for k in range(1, n):
reg = pm.math.concatenate((reg, pm.math.concatenate((b_i[k:k+1], b_ij[k*(n - 1) : (k*(n - 1)) + n - 1]), axis=0)), axis=0)
reward_est = pm.math.dot(X, reg)
# data likelihood
reward_like = pm.Normal('reward_like', mu=reward_est, sd=eps, observed=y)
with hierarchical_model:
trace = pm.sample(5000)
pm.traceplot(trace)