Making # level hierarchy in bayesian model

We are working on a hierarchical model to predict disease case counts at the subdivision level (also referred to as “blocks”), which are administrative units within districts, and districts themselves are part of a state or global-level hierarchy (e.g., the whole of India).

The following data is available:

  1. Subdivision/Block-Level Data:
  • Disease case counts (response variable).
  • Attributes like economic, geographic, and educational features (predictors).
  1. District-Level Data:
  • Aggregated or district-specific attributes related to the predictors (e.g., district-wide averages for literacy, income, or rainfall).
  1. Global/State-Level Data:
  • Broader indicators applicable across the whole state or country.

Goal:

We want to build a three-layer hierarchical Bayesian model that:

  1. Captures global/state-level parameters.
  2. Models district-level parameters as being influenced by global/state-level parameters.
  3. Models subdivision/block-level parameters as being influenced by their parent district-level parameters.

Specific Requirements:

  • The block-level parameters should be informed and constrained by district-level parameters, ensuring logical consistency in predictions.
  • Similarly, the district-level parameters should depend on and be constrained by the global/state-level parameters.
  • Predictions for disease case counts at the block level should combine information from:
    • Global trends (state-level or above).
    • District-wide characteristics.
    • Local block-specific attributes.

Current Status:

We have attempted to implement a hierarchical Bayesian model with these ideas in mind and have attached a graphical representation of our trial-based model for reference. However:

  1. We are unsure if the current model correctly captures the relationships between state, district, and block levels.
  2. We need guidance on whether the current structure adheres to hierarchical regression principles, particularly partial pooling across levels.

Request:

We would like:

  1. A detailed verification of the attached model to check if it meets the above requirements.
  2. Suggestions or modifications to improve the model so that:
  • District-level parameters depend on global/state-level parameters.
  • Block-level parameters are influenced by their respective district-level parameters.
  1. Guidance on how to implement this three-layer hierarchical Bayesian model effectively in PyMC.

Please help us construct a model that satisfies the requirements of hierarchical dependency while ensuring logical consistency and interpretable predictions.
This is our code but we are con
fused how to give district data to district params

import numpy as np
import pandas as pd
import pymc as pm
import pytensor.tensor as at


# Prepare indices for districts and blocks
district_idx_train, districts = pd.factorize(df_train.District, sort=True)
block_idx_train, blocks = pd.factorize(df_train.Block, sort=True)

features = ['Literacy Rate', 'Population', 'clean water access', 'Avg Rainfall(mm)', 'Avg Temp(C)']

# Create mapping from block to district
block_to_district = df_train.groupby('Block')['District'].first()
block_district_mapping = pd.factorize(block_to_district)[0]

coords = {
    'district': districts,
    'block': blocks,
    'feature': features,
}

# Define the model
with pm.Model(coords=coords) as hierarchical_model:
    # Define the data
    X = pm.Data('X', df_train[features])
    y = pm.Data('y', df_train['Count'])
    district_idx_pt = pm.Data('district_idx', district_idx_train)
    block_idx_pt = pm.Data('block_idx', block_idx_train)

    # District level hyperpriors
    mu_district = pm.Normal('mu_district', mu=0, sigma=1)
    sigma_district = pm.Exponential('sigma_district', 1)

    # District-level parameters
    a_district = pm.Normal('a_district', mu=mu_district, sigma=sigma_district, dims=['district'])

    # District-level slopes
    mu_b_district = pm.Normal('mu_b_district', mu=0, sigma=0.5, dims=['feature'])
    sigma_b_district = pm.Exponential('sigma_b_district', 1)
    b_district = pm.Normal('b_district', mu=mu_b_district, sigma=sigma_b_district, dims=['district', 'feature'])

    # Block level parameters (nested within districts)
    # Block-level intercepts
    sigma_a_block = pm.Exponential('sigma_a_block', 1)
    a_block = pm.Normal(
        'a_block',
        mu=a_district[block_district_mapping],
        sigma=sigma_a_block,
        dims=['block']
    )

    # Wishart prior for block-level slopes covariance matrices
    nu = len(features) + 1  # Degrees of freedom
    # Identity matrix scaled can be used as a scale matrix 'S'
    S = np.eye(len(features))

    # Covariance matrices for each block with Wishart distribution
    n_features = len(features)  # Number of features
    nu = n_features + 1  # Degrees of freedom (must be > n_features)

    # Define the scale matrix (identity matrix here for simplicity)
    V = np.eye(n_features)

    cov_block = pm.Wishart(
        'cov_block',
        nu=nu,  # Degrees of freedom
        V=V,    # Scale matrix
        dims=('block', 'feature', 'feature')
    )


    # Block-level slopes
    b_block = pm.MvNormal(
        'b_block',
        mu=b_district[block_district_mapping],
        cov=cov_block[block_idx_pt],
        dims=['block', 'feature']
    )

    # Model error
    sigma = pm.Exponential('sigma', 1)

    # Expected value using block-level parameters
    block_coeffs = b_block[block_idx_pt]
    mu = a_block[block_idx_pt] + at.sum(X * block_coeffs, axis=1)

    # Likelihood
    Y_obs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=y)

1 Like

From the model graph you posted it looks like everything is implemented correctly. I would just suggest you make the district_to_block and block_to_data maps pm.Data, in case you want to do out-of-sample prediction. You also don’t need the district_idx at all – you can see that it’s not attached to anything in the graph.

For discussion on this type of “telescoping” hierarchy, see here and here. You will need to run prior predictive simulations and parameter recovery studies to make sure your model is identified. In general, I’m not sure these types of models are.

Wishart covariance priors are used in the literature because they have nice conjugate properties with Multivariate normal. PyMC does not take advantage of this. It is recommend to instead use an LKJCholeskyCov prior.