Hi @jessegrabowski ,
I am trying to resolve this issue with this response, however I get stuck in this parts
- When you put
level_0_labels, level_0_idx = pd.factorize(df.level_0)
level_1_labels, level_1_idx = pd.factorize(df.level_1)
level_2_labels, level_2_idx = pd.factorize(df.level_2)
Is not the other way around? That is to say
level_0_idx, level_0_labels = pd.factorize(df.level_0)
level_1_idx, level_1_labels = pd.factorize(df.level_1)
level_2_idx, level_2_labels = pd.factorize(df.level_2)
I ask because when applying the map({k:i for i, k in enumerate(level_1_labels)}).values
, my vector gets all nan. Doing the other way around I get a vector of size df[[level_0', 'level_1]].nunique()
of indexes.
- Doing what I understood of your answer I get the following error
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
[...]
----> 8 return pm.Deterministic(f'{name}', mu[:, mapping] + sigma_[:, mapping] * offset, dims=offset_dims)
File ~/opt/anaconda3/envs/pymc_env/lib/python3.11/site-packages/pytensor/tensor/var.py:501, in _tensor_py_operators.__getitem__(self, args)
499 # Check if the number of dimensions isn't too large.
500 if self.ndim < index_dim_count:
--> 501 raise IndexError("too many indices for array")
503 # Convert an Ellipsis if provided into an appropriate number of
504 # slice(None).
505 if len(ellipses) > 1:
IndexError: too many indices for array
The structure of my dataframe is really similar to your answer of food & beverages. This is my code so far
def make_next_level_hierarchy_variable(name, mu, alpha, beta, mapping=None, sigma_dims=None, offset_dims=None):
sigma_ = pm.Gamma(f'{name}_sigma', alpha=alpha, beta=beta, dims=sigma_dims)
offset = pm.Normal(f'{name}_offset', dims=offset_dims)
if mapping is None:
return pm.Deterministic(f'{name}', mu[:, None] + sigma_[:, None] * offset, dims=offset_dims)
else:
return pm.Deterministic(f'{name}', mu[:, mapping] + sigma_[:, mapping] * offset, dims=offset_dims)
# Get edges
level_municipality_idx, level_municipality_labels = pd.factorize(df_work['cve_mun'])
level_sepomex_idx, level_sepomex_labels = pd.factorize(df_work['id_sepomex'])
level_geo_idx, level_geo_labels = pd.factorize(df_work['cvegeo'])
# Dictionaries of edges
# get dictionary of unique edges
df_edges = df_work[['cve_mun', 'id_sepomex', 'cvegeo']].drop_duplicates()
# level municipality to sepomex
level_municipality_by_sepomex = (
df_edges[['cve_mun', 'id_sepomex']]
.drop_duplicates()
.set_index('id_sepomex')['cve_mun']
.sort_index()
.map({label: idx for idx, label in enumerate(level_municipality_labels)})
.values
)
# level sepomex to geo
level_sepomex_by_geo = (
df_edges[['id_sepomex', 'cvegeo']]
.drop_duplicates()
.set_index('cvegeo')['id_sepomex']
.sort_index()
.map({label: idx for idx, label in enumerate(level_sepomex_labels)})
.values
)
# build hierarchical model telescoping mun -> sepomex -> geo
with pm.Model() as hierarchical_model:
# Hyperpriors
mu_mun = pm.Normal('mu_mun', mu=0, sigma=1, dims=['cve_mun'])
# Priors
# sepomex
sepomex_effect = make_next_level_hierarchy_variable(
'sepomex_effect',
mu_mun,
alpha=10,
beta=1,
mapping=level_municipality_by_sepomex,
sigma_dims=['sepomex']
)
# geo
geo_effect = make_next_level_hierarchy_variable(
'geo_effect',
sepomex_effect,
alpha=10,
beta=1,
mapping=level_sepomex_by_geo,
sigma_dims=['cvegeo']
)
# Likelihood
sigma_price = pm.Exponential('sigma_price', 100, dims='obs_id')
mu_price = pm.Deterministic('mu_price', geo_effect, dims='obs_id')
price = pm.LogNormal('price', mu=mu_price, sigma=sigma_price, observed=df_work['price'], dims='obs_id')
- I don’t understand well in which part of the
pm.Model()
I connectlevel_0
withlevel_1
andlevel_1
withlevel_2
. As well, whichcoords
should I use insidepm.Model()
Can you please share how would you build the model in the with
part?
Thanks a lot!
RAVJ