Correlated parameters, uneven group sizes, deep hierarchies

jessegrabowski · June 17, 2022, 5:42am

I’m working with hierarchical geographic data, something like this:

Call N the number of regions at the bottom of the hierarchy, and T the number of time steps over which I observe them. Each higher level is just the sum of the lower levels. I am interested in fitting a simple slope intercept model that allows for information sharing between the different levels of the hierarchy, something like:

y_{i,t} = \alpha_0 + \alpha_{i,1} + \alpha_{i,2} + \alpha_{i,3} + (\gamma_0 + \gamma_{i,1} + \gamma_{i,2} + \gamma_{i,3})\cdot t + \epsilon_{i,t}

In the simple case where all the parameters are independent this is easy enough to do with the usual indexing tricks. I am also curious about the more complex case when the slopes and intercepts are correlated. Apparently one cannot draw values from a centered MVN, because I require a different number of draws from each level (i.e. 5 parameters for the level 1 group, 7 parameters for the level 2, etc)

Question 1: short of working out a structured covariance matrix with lots of duplicated elements, is this true? Could I re-purpose a GP covariance function, for example the Matern function with a suitably chosen \nu (less than 1?) to induce discontinuity between each hierarchical index?

So I do something like this:

    param_chol, corr, stds = pm.LKJCholeskyCov('param_chol', n=6, eta=1.0, sd_dist=sd_dist)
    intercepts = at.stack([offset_intercept_country[country_idx],
                       offset_intercept_region_1[region1_idx],
                       offset_intercept_region_2[region2_idx]], axis=1)
    slopes = at.stack([offset_trend_country[country_idx],
                       offset_trend_region_1[region1_idx],
                       offset_trend_region_2[region2_idx]], axis=1)
    
    params = at.concatenate([intercepts, slopes], axis=1).dot(param_chol)

Where all the offsets are independent normals of the appropriate size and the idx arrays are indexing arrays to broadcast everything to the size of the bottom of the hierarchy.

In principal this should give me what I want, an N \times 6 matrix of offsets with the desired correlations. The final mean would be something like mu = mu_intercept + params[:, :3].sum(axis=1)[None, :] + (mu_slope + params[:, 3:].sum(axis=1)[None, :]) * time[:, None], giving a T \times N matrix of observations.

I’m here because this approach samples extremely poorly, even on synthetic data generated using exactly this procedure, and it gets worse as I hold N constant and increase T, which should be the asymptotically “good” case. This makes me think I’m missing something.

Curious if anyone has experience with data like this.

Topic		Replies	Views
Covariance of multiple varying coefficients in hierarchical models Questions	3	2092	May 25, 2018
Specify covariance between intercepts and slopes in a 'no pooling' model Questions	6	577	December 16, 2020
Indexing in hierarchical modeling version agnostic	0	65	July 2, 2024
Feasibility of large hierarchical multiple regression v5 modeling	4	275	October 24, 2023
MvNormal for covarying intercept & slope in hierarchical model version agnostic modeling , hierarchical	0	342	May 25, 2023

Correlated parameters, uneven group sizes, deep hierarchies

Related topics