Hierarchical model with uneven categories

Maybe you want “telescoping” hierarchies? I have used code like this before:

def make_next_level_hierarchy_variable(name, mu, alpha, beta, mapping=None, sigma_dims=None, offset_dims=None):
    sigma_ = pm.Gamma(f'{name}_sigma', alpha=alpha, beta=beta, dims=sigma_dims)
    offset = pm.Normal(f'{name}_offset', dims=offset_dims)

    if mapping is None:
        return pm.Deterministic(f'{name}', mu[:, None] + sigma_[:, None] * offset, dims=offset_dims)
    else:
        return pm.Deterministic(f'{name}', mu[:, mapping] + sigma_[:, mapping] * offset, dims=offset_dims)

What this function needs is a mapping from one level to the next. For example, suppose you had data like this:

    level_0              level_1          level_2         sales
0  Beverage          Soft Drinks         Lemonade   8003.892152
1  Beverage                 Milk         Oat Milk   4080.121763
2     Snack                Jerky      Vegan Jerky  32426.803818
3  Beverage            Smoothies  Green Smoothies     17.856084
4     Snack             Pretzels  Salted Pretzels   2143.973901
5     Snack                 Nuts          Cashews    303.013850
6  Beverage        Sports Drinks         Poweraid   3269.652452
7  Beverage                  Tea        Green Tea     16.907952
8  Beverage  Alcoholic Beverages          Spirits   2863.434985
9  Beverage                  Tea        Black Tea  15262.873090

You need to construct indexes mapping from one level to the next. That is, all rows that are tea should be centered on the prior for beverage, and all rows of Black Tea should be centered on the prior for Tea. You need df.level_0.nunique() level zero parameters, then df.level_1.nunique() level 1 parameters, and so on.

I do this in two steps. First, factorize all the levels of the data:

level_0_labels, level_0_idx = pd.factorize(df.level_0)
level_1_labels, level_1_idx = pd.factorize(df.level_1)
level_2_labels, level_2_idx = pd.factorize(df.level_2)

Then make dictionaries of unique edges, thinking about the categories as a network like this:

Which you can do like this:

edges = df[['level_0', 'level_1', 'level_2']].drop_duplicates()

level_0_idx_by_level_1 = (edges[['level_0', 'level_1']]
                              .drop_duplicates()
                              .set_index('level_1')['level_0']
                              .sort_index()
                              .map({k:i for i, k in enumerate(level_0_labels)}).values)
level_1_idx_by_level_2 = (edges[['level_1', 'level_2']]
                              .drop_duplicates()
                              .set_index('level_2')['level_1']
                              .sort_index()
                              .map({k:i for i, k in enumerate(level_1_labels)}).values)

These are indexes you can use to “telescope” from one level of prior to the next. For example, here’s level_0_idx_by_level_1:

array([0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0])

You will use this to index a variable level_0_mu = pm.Normal('level_0_mu', 0, 1, dims=['level_0']). In my example, level_0 has two values, Beverage and Snack. Indexing with this will blow it up to len(coords['level_1']), so it can then serve as the mean of the level 1 prior, as in:

level_1_sigma = pm.HalfNormal('level_1_sigma', 1)
level_1_offset = pm.Normal('level_1_offset', 0, 1, dims=['level_1'])
level_1_effect = pm.Deterministic('level_1_effect', level_0_mu[level_0_idx_by_level_1] + level_1_sigma * level_1_offset, dims=['level_1'])

Using my function, it looks like:

level_0_mu = pm.Normal('level_0_mu', 0, 1, dims=['level_0']
level_1_effect = make_next_level_hierarchy_variable('level_1_effect', mu=level_0_mu, alpha=2, beta=1, mapping=level_0_idx_by_level_1, sigma_dims=['level_0'])
level_2_effect = make_next_level_hierarchy_variable('level_2_effect', mu=level_1_mu, alpha=2, beta=1, mapping=level_1_idx_by_level_2, sigma_dims=['level_1'])

Which is what the helper function I started the post with does for you. In this way, each value in level 1 is centered on its associated value from level zero, and so on. The length of the last level in the hierarchy should be the number of observations.

Re: bonus question, the sigmas in a hierarchical model describe the spread of individual effects around the population mean. In principle, I suppose it’s possible that sigma could also be seen as “effects” coming from a population.But as a practical matter, you only have one sigma per hierarchy. In my example, there are only three, which isn’t really enough to do a hierarchical model on. I read that the recommended minimum to go for hierarchy is 7 groups, but I don’t have a citation at hand. In general, it’s harder to do inference about scale parameters than location parameters, so people tend to talk about scale parameters less.

2 Likes