# Hierarchical Linear Regression Model -- why is it not converging?

I created a toy example: 5 Regions with x and y Data; 30 data-items each; Region 5 contains just one pair:

``````N = 30
beta_0 = [3, 3.2, 2.4, 4.1]
beta_1 = [11, 10, 9.1, 14.1]
idx = np.repeat(range(4),N)
x = np.array([]) ; y = np.array([])
for b_0,b_1 in zip(beta_0,beta_1):
x_new = st.uniform(0,20).rvs(N)
x = np.append(x, x_new)
eps = st.norm(0,10).rvs(N)
y = np.append(y,b_0 + b_1*x_new + eps)
x = np.append(x,10) ; y = np.append(y,3+11*10)
idx = np.append(idx,4)
``````

A non-hierarchical Model works fine / as expected. Here are Density Plots of the Posterior-Samples for beta_0 and beta_1 per region: But my attempt to model it hierarchically fails. Thats the pymc3 code:

``````ith pm.Model() as hierarch:
mu_b_0 = pm.Normal("mu_b_0", mu=0, sd=10)
sd_b_0 = pm.HalfCauchy("sd_b_0", 5)
mu_b_1 = pm.Normal("mu_b_1", mu=0, sd=10)
sd_b_1 = pm.HalfCauchy("sd_b_1", 5)
beta_0 = pm.Normal("beta_0", mu=mu_b_0, sd=sd_b_0, shape=5)
beta_1 = pm.Normal("beta_1", mu=mu_b_1, sd=sd_b_1, shape=5)
nu = pm.Deterministic("nu", pm.Exponential("nu1", 1/20)+1)
eps = pm.HalfCauchy("eps",5)
pred = pm.StudentT("pred", mu=beta_0[idx] + beta_1[idx]*x, sd = eps, nu=nu, observed=y)
start = pm.find_MAP()
trace_hier = pm.sample(2000,start=start)
``````

the traceplots for beta_0 und beta_1 show that they do not converge - specifically no self-similartiy of the trace and no sensible results for the priors…

Anyone knows how I can improve this?

Sampling with the default is recommended, starting with `find_MAP()` is usually not ideal.
For example, sampling using the default seems to give quite sensible result:

``````with hierarch:
trace_hier = pm.sample(2000, tune=1000)
pm.traceplot(trace_hier, varnames=['beta_0', 'beta_1', 'eps'],
lines={'beta_0':[3, 3.2, 2.4, 4.1, 3],
'beta_1':[11, 10, 9.1, 14.1, 11]});
``````

However, there are some divergence warning, you might want to change the prior of sd to HalfNormal instead of HalfCauchy.

One way to improve the model is:

``````with pm.Model() as hierarch:
mu_b_0 = pm.Normal("mu_b_0", mu=0, sd=100)
sd_b_0 = pm.HalfCauchy("sd_b_0", 5)
mu_b_1 = pm.Normal("mu_b_1", mu=0, sd=100)
sd_b_1 = pm.HalfCauchy("sd_b_1", 5)
beta0_ = pm.Normal("beta0_", mu=0, sd=1, shape=5)
beta1_ = pm.Normal("beta1_", mu=0, sd=1, shape=5)
beta_0 = pm.Deterministic('beta_0', beta0_*sd_b_0+mu_b_0)
beta_1 = pm.Deterministic('beta_1', beta1_*sd_b_1+mu_b_1)
nu = pm.Exponential("nu", 2)
eps = pm.HalfCauchy("eps", 5)
pred = pm.StudentT(
"pred", mu=beta_0[idx] + beta_1[idx] * x, sd=eps, nu=nu+1, observed=y)
trace_hier = pm.sample(2000, tune=1000)
``````

The affine transformation from `beta0_` to get `beta_0` helps a lot to get rid of divergence.

Wow, many thanks for the competent answers - works perfectly!

1 Like