Hierarchical Linear Regression - Simplest Possible Example

sithankanna · December 29, 2020, 7:21pm

Hi all,

I’ve attempted to create the simplest possible example illustrate hierarchical models for linear regression (even simpler than the Radon Gas case study).

Problem Setup
The data has 2 univariate Regressions (scalar regression coefficient + no intercept)

\begin{align} x_{1} {} & \sim \text{Normal}(\mu = 0, \sigma^2 = 0.25) \\ x_{2} {} & \sim \text{Normal}(\mu = 0, \sigma^2 = 0.25) \\ \eta {} & \sim \text{Normal}(\mu = 0, \sigma^2 = 1) \\ \beta_1 {} & = 2, \hspace{2mm} \beta_2 = -5 \\ y_1 {} & = \beta_1x_1 + \eta \\ y_2 {} & = \beta_2x_1 + \eta \end{align}

Then I concatenated y_1, y_2 and x_1, x_2 into a single vectors e.g. \mathbf{y} = [y_1^{(0)} \dots y_2^{N_1}, y_2^{(0)} \dots y_2^{N_2}] and added an indicator variable called site to differentiate the data from the 2 different linear models.

def create_dataset() -> pd.DataFrame:
    """
    Assume there are 2 sites with different regression coefficients 
    
    """

    n = 20
    beta_opt_1 = 2.0
    beta_opt_2 = -5.0

    std_noise_x = 0.5
    std_noise_y = 1

    x_1 = std_noise_x * np.random.randn(n)
    y_1 = beta_opt_1 * x_1 + std_noise_y * np.random.randn(n)

    x_2 = std_noise_x * np.random.randn(n)
    y_2 = beta_opt_2 * x_2 + std_noise_y * np.random.randn(n)

    df_1 = pd.DataFrame({"y": y_1, "x": x_1}).assign(
        **{"site": "site_A", "site_idx": 0}
    )
    df_2 = pd.DataFrame({"y": y_2, "x": x_2}).assign(
        **{"site": "site_B", "site_idx": 1}
    )

    df = pd.concat([df_1, df_2], axis="rows")

    return df


data = create_dataset()
n_sites = len(data["site"].unique())
site_indices = data["site_idx"].values

Then, I performed inference using 3 different schemes (similar to the Radon data model)

Pooled
Unpooled (Individual regressions for y_1, and y_2)
Hierarachical Regression

Pooled

with pm.Model() as pooled_model:

    # Define priors
    sigma_pooled = pm.HalfCauchy("sigma_pooled", beta=5)
    beta_pooled = pm.Normal("beta_pooled", mu=0, sd=20)

    # Define likelihood
    likelihood = pm.Normal(
        "y_hat", mu=beta_pooled * data["x"], sd=sigma_pooled, observed=data["y"]
    )

    # Inference!
    trace_pooled = pm.sample(draws=3000)
    
    # Getting one chain
    posterior_beta_pooled = trace_pooled.get_values("beta_pooled", burn=1000, chains=[0])

Unpooled

with pm.Model() as individual_model:

    # Define priors now with mu
    sigma_individual = pm.HalfCauchy("sigma_individual", beta=5, shape=n_sites)
    beta_individual = pm.Normal("beta_individual", mu=0, sd=20, shape=n_sites)

    # Define likelihood
    likelihood = pm.Normal(
        "y_individual",
        mu=beta_individual[site_indices] * data["x"],
        sd=sigma_individual[site_indices],
        observed=data["y"],
    )

    # Inference!
    trace_individual = pm.sample(draws=3000)
    
    posterior_beta_individual = trace_individual.get_values(
        "beta_individual", burn=1000, chains=[0]
    )

Hierarachical

with pm.Model() as hierarchical_model:

    sigma_hierarchical = pm.HalfCauchy("sigma_hierarchical", beta=5, shape=n_sites)

    # The step that makes it hierarchical
    # We only assume beta is linked to the global
    mu_global = pm.Normal("mu_global", mu=0, sd=0.1)
    sd_global = pm.HalfCauchy("sd_global", beta=0.1)
    beta_hierarchical = pm.Normal(
        "beta_hierarchical", mu=mu_global, sd=sd_global, shape=n_sites
    )

    # Define likelihood
    likelihood = pm.Normal(
        "y_hierarchical",
        mu=beta_hierarchical[site_indices] * data["x"],
        sigma=sigma_hierarchical[site_indices],
        observed=data["y"],
    )

    # Inference!
    trace_hierarchical = pm.sample(draws=3000)
    
    posterior_beta_hierarchical = trace_hierarchical.get_values(
        "beta_hierarchical", burn=1000, chains=[1]
    )

The goal of the exercise was to show that the hierarchical estimates sits somewhere in between the pooled and individual estimates. And this difference is effected by a few things

The confidence in the hyper-prior
The amount of data available for the different “sites” (i.e. confidence in the data)

But I find the hierarchical model to always yield posteriors closer to the individual estimates rather than the pooled estimate. In other words, it looks like using a hierarchical formulation doesn’t change the results compared to the individual estimates.

Can I formulate the problem better to illustrate the mechanics of hierarchical modelling?
Is there a different set of hyper-parameters needed to illustrate what I want to show?

luisroque · December 30, 2020, 10:08am

Hi,

I read everything rather quickly, please tell me if I miss something. You are using a dataset with only two groups and using two parameters to estimate it. If you want to pool towards the mean I would suggest that you only specify a global mean with a larger sd. You can see that in this way you achieve the pooled effect.

with pm.Model() as hierarchical_model:

    sigma_hierarchical = pm.HalfCauchy("sigma_hierarchical", beta=5, shape=n_sites)

    # The step that makes it hierarchical
    # We only assume beta is linked to the global
    mu_global = pm.Normal("mu_global", mu=0, sd=5)
    #sd_global = pm.HalfCauchy("sd_global", beta=0.1)
    beta_hierarchical = pm.Normal(
        "beta_hierarchical", mu=mu_global, sd=0.5, shape=n_sites
    )

    # Define likelihood
    likelihood = pm.Normal(
        "y_hierarchical",
        mu=beta_hierarchical[site_indices] * data["x"],
        sigma=sigma_hierarchical[site_indices],
        observed=data["y"],
    )

    # Inference!
    trace_hierarchical = pm.sample(draws=3000)
    
    posterior_beta_hierarchical = trace_hierarchical.get_values(
        "beta_hierarchical", burn=1000, chains=[1]
    )

sithankanna · December 30, 2020, 11:28am

It worked!

As you suggested the trick was not to have a global standard deviation for the coefficients , only a global mean.

I haven’t thought about it properly yet but what’s the reason behind this?

Is that by trying to also learn the uncertainty you have in coefficients, you actually learn the right thing (individual estimates)?

In the Radon gas example, they do try to learn the uncertainty in the coefficients but it seems to “work”.

Topic		Replies	Views
Why are these two hierarchical models different? Questions	4	456	April 11, 2020
Hierarchical Logistic Regression. How to reparameterize? Questions	1	443	October 26, 2021
Hierarchical linear model, estimated parameters the same for each group v3 modeling , hierarchical	5	563	September 8, 2022
Hierarchical Linear Regression Model -- why is it not converging? Questions	3	1562	March 14, 2018
Hierarchical regression models for ratings data ( 2 by 2 within-subject design) Questions	3	1741	December 12, 2019

Hierarchical Linear Regression - Simplest Possible Example

Related topics