Maybe you want “telescoping” hierarchies? I have used code like this before:
def make_next_level_hierarchy_variable(name, mu, alpha, beta, mapping=None, sigma_dims=None, offset_dims=None):
sigma_ = pm.Gamma(f'{name}_sigma', alpha=alpha, beta=beta, dims=sigma_dims)
offset = pm.Normal(f'{name}_offset', dims=offset_dims)
if mapping is None:
return pm.Deterministic(f'{name}', mu[:, None] + sigma_[:, None] * offset, dims=offset_dims)
else:
return pm.Deterministic(f'{name}', mu[:, mapping] + sigma_[:, mapping] * offset, dims=offset_dims)
What this function needs is a mapping from one level to the next. For example, suppose you had data like this:
level_0 level_1 level_2 sales
0 Beverage Soft Drinks Lemonade 8003.892152
1 Beverage Milk Oat Milk 4080.121763
2 Snack Jerky Vegan Jerky 32426.803818
3 Beverage Smoothies Green Smoothies 17.856084
4 Snack Pretzels Salted Pretzels 2143.973901
5 Snack Nuts Cashews 303.013850
6 Beverage Sports Drinks Poweraid 3269.652452
7 Beverage Tea Green Tea 16.907952
8 Beverage Alcoholic Beverages Spirits 2863.434985
9 Beverage Tea Black Tea 15262.873090
You need to construct indexes mapping from one level to the next. That is, all rows that are tea
should be centered on the prior for beverage
, and all rows of Black Tea
should be centered on the prior for Tea
. You need df.level_0.nunique()
level zero parameters, then df.level_1.nunique()
level 1 parameters, and so on.
I do this in two steps. First, factorize all the levels of the data:
level_0_labels, level_0_idx = pd.factorize(df.level_0)
level_1_labels, level_1_idx = pd.factorize(df.level_1)
level_2_labels, level_2_idx = pd.factorize(df.level_2)
Then make dictionaries of unique edges, thinking about the categories as a network like this:
Which you can do like this:
edges = df[['level_0', 'level_1', 'level_2']].drop_duplicates()
level_0_idx_by_level_1 = (edges[['level_0', 'level_1']]
.drop_duplicates()
.set_index('level_1')['level_0']
.sort_index()
.map({k:i for i, k in enumerate(level_0_labels)}).values)
level_1_idx_by_level_2 = (edges[['level_1', 'level_2']]
.drop_duplicates()
.set_index('level_2')['level_1']
.sort_index()
.map({k:i for i, k in enumerate(level_1_labels)}).values)
These are indexes you can use to “telescope” from one level of prior to the next. For example, here’s level_0_idx_by_level_1
:
array([0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0])
You will use this to index a variable level_0_mu = pm.Normal('level_0_mu', 0, 1, dims=['level_0'])
. In my example, level_0
has two values, Beverage and Snack. Indexing with this will blow it up to len(coords['level_1'])
, so it can then serve as the mean of the level 1 prior, as in:
level_1_sigma = pm.HalfNormal('level_1_sigma', 1)
level_1_offset = pm.Normal('level_1_offset', 0, 1, dims=['level_1'])
level_1_effect = pm.Deterministic('level_1_effect', level_0_mu[level_0_idx_by_level_1] + level_1_sigma * level_1_offset, dims=['level_1'])
Using my function, it looks like:
level_0_mu = pm.Normal('level_0_mu', 0, 1, dims=['level_0']
level_1_effect = make_next_level_hierarchy_variable('level_1_effect', mu=level_0_mu, alpha=2, beta=1, mapping=level_0_idx_by_level_1, sigma_dims=['level_0'])
level_2_effect = make_next_level_hierarchy_variable('level_2_effect', mu=level_1_mu, alpha=2, beta=1, mapping=level_1_idx_by_level_2, sigma_dims=['level_1'])
Which is what the helper function I started the post with does for you. In this way, each value in level 1 is centered on its associated value from level zero, and so on. The length of the last level in the hierarchy should be the number of observations.
Re: bonus question, the sigmas in a hierarchical model describe the spread of individual effects around the population mean. In principle, I suppose it’s possible that sigma could also be seen as “effects” coming from a population.But as a practical matter, you only have one sigma per hierarchy. In my example, there are only three, which isn’t really enough to do a hierarchical model on. I read that the recommended minimum to go for hierarchy is 7 groups, but I don’t have a citation at hand. In general, it’s harder to do inference about scale parameters than location parameters, so people tend to talk about scale parameters less.