Hierarchical Linear Regression Model -- why is it not converging?

I created a toy example: 5 Regions with x and y Data; 30 data-items each; Region 5 contains just one pair:

N = 30
beta_0 = [3, 3.2, 2.4, 4.1]
beta_1 = [11, 10, 9.1, 14.1]
idx = np.repeat(range(4),N)
x = np.array([]) ; y = np.array([])
for b_0,b_1 in zip(beta_0,beta_1): 
    x_new = st.uniform(0,20).rvs(N)
    x = np.append(x, x_new)
    eps = st.norm(0,10).rvs(N)
    y = np.append(y,b_0 + b_1*x_new + eps)
x = np.append(x,10) ; y = np.append(y,3+11*10)
idx = np.append(idx,4)

A non-hierarchical Model works fine / as expected. Here are Density Plots of the Posterior-Samples for beta_0 and beta_1 per region:
Download

But my attempt to model it hierarchically fails. Thats the pymc3 code:

ith pm.Model() as hierarch: 
    mu_b_0 = pm.Normal("mu_b_0", mu=0, sd=10)
    sd_b_0 = pm.HalfCauchy("sd_b_0", 5)
    mu_b_1 = pm.Normal("mu_b_1", mu=0, sd=10)
    sd_b_1 = pm.HalfCauchy("sd_b_1", 5)
    beta_0 = pm.Normal("beta_0", mu=mu_b_0, sd=sd_b_0, shape=5)
    beta_1 = pm.Normal("beta_1", mu=mu_b_1, sd=sd_b_1, shape=5)
    nu = pm.Deterministic("nu", pm.Exponential("nu1", 1/20)+1)
    eps = pm.HalfCauchy("eps",5)
    pred = pm.StudentT("pred", mu=beta_0[idx] + beta_1[idx]*x, sd = eps, nu=nu, observed=y)
    start = pm.find_MAP()
    trace_hier = pm.sample(2000,start=start)

the traceplots for beta_0 und beta_1 show that they do not converge - specifically no self-similartiy of the trace and no sensible results for the priors…

Anyone knows how I can improve this?

Sampling with the default is recommended, starting with find_MAP() is usually not ideal.
For example, sampling using the default seems to give quite sensible result:

with hierarch:
    trace_hier = pm.sample(2000, tune=1000)
pm.traceplot(trace_hier, varnames=['beta_0', 'beta_1', 'eps'],
            lines={'beta_0':[3, 3.2, 2.4, 4.1, 3],
                   'beta_1':[11, 10, 9.1, 14.1, 11]});

However, there are some divergence warning, you might want to change the prior of sd to HalfNormal instead of HalfCauchy.

One way to improve the model is:

with pm.Model() as hierarch:
    mu_b_0 = pm.Normal("mu_b_0", mu=0, sd=100)
    sd_b_0 = pm.HalfCauchy("sd_b_0", 5)
    mu_b_1 = pm.Normal("mu_b_1", mu=0, sd=100)
    sd_b_1 = pm.HalfCauchy("sd_b_1", 5)
    beta0_ = pm.Normal("beta0_", mu=0, sd=1, shape=5)
    beta1_ = pm.Normal("beta1_", mu=0, sd=1, shape=5)
    beta_0 = pm.Deterministic('beta_0', beta0_*sd_b_0+mu_b_0)
    beta_1 = pm.Deterministic('beta_1', beta1_*sd_b_1+mu_b_1)
    nu = pm.Exponential("nu", 2)
    eps = pm.HalfCauchy("eps", 5)
    pred = pm.StudentT(
        "pred", mu=beta_0[idx] + beta_1[idx] * x, sd=eps, nu=nu+1, observed=y)
    trace_hier = pm.sample(2000, tune=1000)

The affine transformation from beta0_ to get beta_0 helps a lot to get rid of divergence.

Wow, many thanks for the competent answers - works perfectly!

1 Like