Hierarchical model with uneven categories

Hi @jessegrabowski ,

I am trying to resolve this issue with this response, however I get stuck in this parts

  1. When you put
level_0_labels, level_0_idx = pd.factorize(df.level_0)
level_1_labels, level_1_idx = pd.factorize(df.level_1)
level_2_labels, level_2_idx = pd.factorize(df.level_2)

Is not the other way around? That is to say

level_0_idx, level_0_labels = pd.factorize(df.level_0)
level_1_idx, level_1_labels = pd.factorize(df.level_1)
level_2_idx, level_2_labels = pd.factorize(df.level_2)

I ask because when applying the map({k:i for i, k in enumerate(level_1_labels)}).values, my vector gets all nan. Doing the other way around I get a vector of size df[[level_0', 'level_1]].nunique() of indexes.

  1. Doing what I understood of your answer I get the following error
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
[...]
----> 8     return pm.Deterministic(f'{name}', mu[:, mapping] + sigma_[:, mapping] * offset, dims=offset_dims)

File ~/opt/anaconda3/envs/pymc_env/lib/python3.11/site-packages/pytensor/tensor/var.py:501, in _tensor_py_operators.__getitem__(self, args)
    499 # Check if the number of dimensions isn't too large.
    500 if self.ndim < index_dim_count:
--> 501     raise IndexError("too many indices for array")
    503 # Convert an Ellipsis if provided into an appropriate number of
    504 # slice(None).
    505 if len(ellipses) > 1:

IndexError: too many indices for array

The structure of my dataframe is really similar to your answer of food & beverages. This is my code so far

def make_next_level_hierarchy_variable(name, mu, alpha, beta, mapping=None, sigma_dims=None, offset_dims=None):
    sigma_ = pm.Gamma(f'{name}_sigma', alpha=alpha, beta=beta, dims=sigma_dims)
    offset = pm.Normal(f'{name}_offset', dims=offset_dims)

    if mapping is None:
        return pm.Deterministic(f'{name}', mu[:, None] + sigma_[:, None] * offset, dims=offset_dims)
    else:
        return pm.Deterministic(f'{name}', mu[:, mapping] + sigma_[:, mapping] * offset, dims=offset_dims)

# Get edges
level_municipality_idx, level_municipality_labels = pd.factorize(df_work['cve_mun'])
level_sepomex_idx, level_sepomex_labels = pd.factorize(df_work['id_sepomex'])
level_geo_idx, level_geo_labels = pd.factorize(df_work['cvegeo'])

# Dictionaries of edges
# get dictionary of unique edges
df_edges = df_work[['cve_mun', 'id_sepomex', 'cvegeo']].drop_duplicates()

# level municipality to sepomex
level_municipality_by_sepomex = (
    df_edges[['cve_mun', 'id_sepomex']]
    .drop_duplicates()
    .set_index('id_sepomex')['cve_mun']
    .sort_index()
    .map({label: idx for idx, label in enumerate(level_municipality_labels)})
    .values
)
# level sepomex to geo
level_sepomex_by_geo = (
    df_edges[['id_sepomex', 'cvegeo']]
    .drop_duplicates()
    .set_index('cvegeo')['id_sepomex']
    .sort_index()
    .map({label: idx for idx, label in enumerate(level_sepomex_labels)})
    .values
)

# build hierarchical model telescoping mun -> sepomex -> geo
with pm.Model() as hierarchical_model:
    # Hyperpriors
    mu_mun = pm.Normal('mu_mun', mu=0, sigma=1, dims=['cve_mun'])

    # Priors
    # sepomex
    sepomex_effect = make_next_level_hierarchy_variable(
        'sepomex_effect',
        mu_mun,
        alpha=10,
        beta=1,
        mapping=level_municipality_by_sepomex,
        sigma_dims=['sepomex']
    )
    # geo
    geo_effect = make_next_level_hierarchy_variable(
        'geo_effect',
        sepomex_effect,
        alpha=10,
        beta=1,
        mapping=level_sepomex_by_geo,
        sigma_dims=['cvegeo']
    )

    # Likelihood
    sigma_price = pm.Exponential('sigma_price', 100, dims='obs_id')
    mu_price = pm.Deterministic('mu_price', geo_effect, dims='obs_id')
    price = pm.LogNormal('price', mu=mu_price, sigma=sigma_price, observed=df_work['price'], dims='obs_id')
  1. I don’t understand well in which part of the pm.Model() I connect level_0 with level_1 and level_1 with level_2. As well, which coords should I use inside pm.Model()

Can you please share how would you build the model in the with part?

Thanks a lot!
RAVJ