Hierarchical model with uneven categories

I’m working with a hierarchical model and I have some conceptual doubts about how to declare hierarchies.

I have hierarchies that are product categories (e.g., food, beverages) and product families (e.g., within food → fruits, chocolates; within beverages → water, soft drinks). Therefore, the fruit family exists only within the food category, and has no relationship to the beverages category.

The way I’m stating the hierarchies today is as follows:

idx_category, category = df["category"].factorize()
idx_family, family = df["family"].factorize()

coords = {
    "obs_id": idx_category,
    "category": category,
    "family": family,
}

with pm.Model(coords=coords) as model:
    price = pm.MutableData("price", df["log_price"].values, dims="obs_id")

    # price influence
    a_global = pm.Normal("a_global", mu=5, sigma=10)
    
    sigma_a_cat = pm.HalfNormal("sigma_a_cat", sigma=2)
    a_cat = pm.Normal("a_cat", mu=a_global, sigma=sigma_a_cat, dims="category")
    
    sigma_a_fam = pm.HalfNormal("sigma_a_fam", sigma=2, dims="category")
    a_fam = pm.Normal("a_fam", mu=a_cat, sigma=sigma_a_fam, dims=("family", "category"))

    # intercept
    T0_global = pm.Normal("T0_global", mu=10, sigma=10)
    
    sigma_T0_cat = pm.HalfNormal("sigma_T0_cat", sigma=2)
    T0_cat = pm.Normal("T0_cat", mu=T0_global, sigma=sigma_T0_cat, dims="category")

    sigma_T0_fam = pm.HalfNormal("sigma_T0_fam", sigma=2, dims="category")
    T0_fam = pm.Normal("T0_fam", mu=T0_cat, sigma=sigma_T0_fam, dims=("family", "category"))

    mu = T0_fam[idx_family, idx_category] + a_fam[idx_family, idx_category] * p
    Y_obs = pm.Exponential("Y_obs", lam=1/mu, observed=df["y_obs"].values)

However, when I use this approach, because of dims=("family", "category"), I end up with parameters like a_fam[fruits, beverages] - which don’t make sense. Sampling is taking a long time to run, and my hypothesis here is that this explosion of unnecessary parameters is affecting performance.

What would be the best way to declare hierarchies in this situation?

Bonus question: In this model, I only have hierarchies on the mu parameters for each pm.Normal - i.e., a_cat mu’s comes from a_global and a_fam mu’s comes from a_cat - but sigma_a_cat and sigma_a_fam are independent of the hierarchical leveling. Is this the right approach or should I also create a hierarchy of sigmas? My fear here is that with independent sigmas, things deviate a lot from the higher hierarchy.

Some updates here - I’ve been experimenting a bit with different approaches, I think this is one that fits what I needed:

Basically, instead of defining the last hierarchy as dims=("family", "category"), I can create a vector category_family with valid combinations and specify which category distribution to use from the higher hierarchy with a mapping.

df["category_family"] = df["category"] + "__" + df["family"]

idx_category_family, category_family = df["category_family"].factorize()
mapping_category_family = [list(category).index(cat_fam.split("__")[0]) for cat_fam in category_family]

coords["category_family"] = category_family

with pm.Model(coords=coords) as model:
    ... # everything else stays the same
    a_fam = pm.Normal("a_fam", mu=a_cat[mapping_category_family], sigma=sigma_a_fam[mapping_category_family], dims="category_family")
    ...
    T0_fam = pm.Normal("T0_fam", mu=T0_cat[mapping_category_family], sigma=sigma_T0_fam[mapping_category_family], dims="category_family")

    mu = T0_fam[idx_category_family] + a_fam[idx_category_family] * p
    ...

I feel like this should be the default way of declaring the hierarchies to get the result I want, right? Still, I would appreciate any insight into the hierarchy of sigmas - is this something worth trying?

Maybe you want “telescoping” hierarchies? I have used code like this before:

def make_next_level_hierarchy_variable(name, mu, alpha, beta, mapping=None, sigma_dims=None, offset_dims=None):
    sigma_ = pm.Gamma(f'{name}_sigma', alpha=alpha, beta=beta, dims=sigma_dims)
    offset = pm.Normal(f'{name}_offset', dims=offset_dims)

    if mapping is None:
        return pm.Deterministic(f'{name}', mu[:, None] + sigma_[:, None] * offset, dims=offset_dims)
    else:
        return pm.Deterministic(f'{name}', mu[:, mapping] + sigma_[:, mapping] * offset, dims=offset_dims)

What this function needs is a mapping from one level to the next. For example, suppose you had data like this:

    level_0              level_1          level_2         sales
0  Beverage          Soft Drinks         Lemonade   8003.892152
1  Beverage                 Milk         Oat Milk   4080.121763
2     Snack                Jerky      Vegan Jerky  32426.803818
3  Beverage            Smoothies  Green Smoothies     17.856084
4     Snack             Pretzels  Salted Pretzels   2143.973901
5     Snack                 Nuts          Cashews    303.013850
6  Beverage        Sports Drinks         Poweraid   3269.652452
7  Beverage                  Tea        Green Tea     16.907952
8  Beverage  Alcoholic Beverages          Spirits   2863.434985
9  Beverage                  Tea        Black Tea  15262.873090

You need to construct indexes mapping from one level to the next. That is, all rows that are tea should be centered on the prior for beverage, and all rows of Black Tea should be centered on the prior for Tea. You need df.level_0.nunique() level zero parameters, then df.level_1.nunique() level 1 parameters, and so on.

I do this in two steps. First, factorize all the levels of the data:

level_0_labels, level_0_idx = pd.factorize(df.level_0)
level_1_labels, level_1_idx = pd.factorize(df.level_1)
level_2_labels, level_2_idx = pd.factorize(df.level_2)

Then make dictionaries of unique edges, thinking about the categories as a network like this:

Which you can do like this:

edges = df[['level_0', 'level_1', 'level_2']].drop_duplicates()

level_0_idx_by_level_1 = (edges[['level_0', 'level_1']]
                              .drop_duplicates()
                              .set_index('level_1')['level_0']
                              .sort_index()
                              .map({k:i for i, k in enumerate(level_0_labels)}).values)
level_1_idx_by_level_2 = (edges[['level_1', 'level_2']]
                              .drop_duplicates()
                              .set_index('level_2')['level_1']
                              .sort_index()
                              .map({k:i for i, k in enumerate(level_1_labels)}).values)

These are indexes you can use to “telescope” from one level of prior to the next. For example, here’s level_0_idx_by_level_1:

array([0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0])

You will use this to index a variable level_0_mu = pm.Normal('level_0_mu', 0, 1, dims=['level_0']). In my example, level_0 has two values, Beverage and Snack. Indexing with this will blow it up to len(coords['level_1']), so it can then serve as the mean of the level 1 prior, as in:

level_1_sigma = pm.HalfNormal('level_1_sigma', 1)
level_1_offset = pm.Normal('level_1_offset', 0, 1, dims=['level_1'])
level_1_effect = pm.Deterministic('level_1_effect', level_0_mu[level_0_idx_by_level_1] + level_1_sigma * level_1_offset, dims=['level_1'])

Using my function, it looks like:

level_0_mu = pm.Normal('level_0_mu', 0, 1, dims=['level_0']
level_1_effect = make_next_level_hierarchy_variable('level_1_effect', mu=level_0_mu, alpha=2, beta=1, mapping=level_0_idx_by_level_1, sigma_dims=['level_0'])
level_2_effect = make_next_level_hierarchy_variable('level_2_effect', mu=level_1_mu, alpha=2, beta=1, mapping=level_1_idx_by_level_2, sigma_dims=['level_1'])

Which is what the helper function I started the post with does for you. In this way, each value in level 1 is centered on its associated value from level zero, and so on. The length of the last level in the hierarchy should be the number of observations.

Re: bonus question, the sigmas in a hierarchical model describe the spread of individual effects around the population mean. In principle, I suppose it’s possible that sigma could also be seen as “effects” coming from a population.But as a practical matter, you only have one sigma per hierarchy. In my example, there are only three, which isn’t really enough to do a hierarchical model on. I read that the recommended minimum to go for hierarchy is 7 groups, but I don’t have a citation at hand. In general, it’s harder to do inference about scale parameters than location parameters, so people tend to talk about scale parameters less.

2 Likes

Interesting, could one say this would be because each group is roughly a single data point when estimating the hierarchical sigma and I guess general rule of thumb for estimating sd in the classical setting is 5+.

1 Like

Hi @jessegrabowski ,

I am trying to resolve this issue with this response, however I get stuck in this parts

  1. When you put
level_0_labels, level_0_idx = pd.factorize(df.level_0)
level_1_labels, level_1_idx = pd.factorize(df.level_1)
level_2_labels, level_2_idx = pd.factorize(df.level_2)

Is not the other way around? That is to say

level_0_idx, level_0_labels = pd.factorize(df.level_0)
level_1_idx, level_1_labels = pd.factorize(df.level_1)
level_2_idx, level_2_labels = pd.factorize(df.level_2)

I ask because when applying the map({k:i for i, k in enumerate(level_1_labels)}).values, my vector gets all nan. Doing the other way around I get a vector of size df[[level_0', 'level_1]].nunique() of indexes.

  1. Doing what I understood of your answer I get the following error
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
[...]
----> 8     return pm.Deterministic(f'{name}', mu[:, mapping] + sigma_[:, mapping] * offset, dims=offset_dims)

File ~/opt/anaconda3/envs/pymc_env/lib/python3.11/site-packages/pytensor/tensor/var.py:501, in _tensor_py_operators.__getitem__(self, args)
    499 # Check if the number of dimensions isn't too large.
    500 if self.ndim < index_dim_count:
--> 501     raise IndexError("too many indices for array")
    503 # Convert an Ellipsis if provided into an appropriate number of
    504 # slice(None).
    505 if len(ellipses) > 1:

IndexError: too many indices for array

The structure of my dataframe is really similar to your answer of food & beverages. This is my code so far

def make_next_level_hierarchy_variable(name, mu, alpha, beta, mapping=None, sigma_dims=None, offset_dims=None):
    sigma_ = pm.Gamma(f'{name}_sigma', alpha=alpha, beta=beta, dims=sigma_dims)
    offset = pm.Normal(f'{name}_offset', dims=offset_dims)

    if mapping is None:
        return pm.Deterministic(f'{name}', mu[:, None] + sigma_[:, None] * offset, dims=offset_dims)
    else:
        return pm.Deterministic(f'{name}', mu[:, mapping] + sigma_[:, mapping] * offset, dims=offset_dims)

# Get edges
level_municipality_idx, level_municipality_labels = pd.factorize(df_work['cve_mun'])
level_sepomex_idx, level_sepomex_labels = pd.factorize(df_work['id_sepomex'])
level_geo_idx, level_geo_labels = pd.factorize(df_work['cvegeo'])

# Dictionaries of edges
# get dictionary of unique edges
df_edges = df_work[['cve_mun', 'id_sepomex', 'cvegeo']].drop_duplicates()

# level municipality to sepomex
level_municipality_by_sepomex = (
    df_edges[['cve_mun', 'id_sepomex']]
    .drop_duplicates()
    .set_index('id_sepomex')['cve_mun']
    .sort_index()
    .map({label: idx for idx, label in enumerate(level_municipality_labels)})
    .values
)
# level sepomex to geo
level_sepomex_by_geo = (
    df_edges[['id_sepomex', 'cvegeo']]
    .drop_duplicates()
    .set_index('cvegeo')['id_sepomex']
    .sort_index()
    .map({label: idx for idx, label in enumerate(level_sepomex_labels)})
    .values
)

# build hierarchical model telescoping mun -> sepomex -> geo
with pm.Model() as hierarchical_model:
    # Hyperpriors
    mu_mun = pm.Normal('mu_mun', mu=0, sigma=1, dims=['cve_mun'])

    # Priors
    # sepomex
    sepomex_effect = make_next_level_hierarchy_variable(
        'sepomex_effect',
        mu_mun,
        alpha=10,
        beta=1,
        mapping=level_municipality_by_sepomex,
        sigma_dims=['sepomex']
    )
    # geo
    geo_effect = make_next_level_hierarchy_variable(
        'geo_effect',
        sepomex_effect,
        alpha=10,
        beta=1,
        mapping=level_sepomex_by_geo,
        sigma_dims=['cvegeo']
    )

    # Likelihood
    sigma_price = pm.Exponential('sigma_price', 100, dims='obs_id')
    mu_price = pm.Deterministic('mu_price', geo_effect, dims='obs_id')
    price = pm.LogNormal('price', mu=mu_price, sigma=sigma_price, observed=df_work['price'], dims='obs_id')
  1. I don’t understand well in which part of the pm.Model() I connect level_0 with level_1 and level_1 with level_2. As well, which coords should I use inside pm.Model()

Can you please share how would you build the model in the with part?

Thanks a lot!
RAVJ