I have this model which seems to perform fairly well, i’d like to plot_ppc for specific sub_groups though, which i can’t figure out how to do using az.plot_ppc. How do I create a ppc plot for each sub_group?
Create mappings
subgroup_to_prodgroup = first_deliveries_df[[‘groupCode2’, ‘groupCode1’]].drop_duplicates()
Reindex using factorize indices
_, sub_groups = pd.factorize(subgroup_to_prodgroup[‘groupCode2’])
_, prod_groups = pd.factorize(subgroup_to_prodgroup[‘groupCode1’])Create mapping: for each subgroup, which product group index
subgroup_to_prodgroup_idx = pd.Series(prod_groups).values
Factorize the product groups and subgroups first
prod_group_idx, prod_groups = pd.factorize(first_deliveries_df[‘groupCode1’])
sub_group_idx, sub_groups = pd.factorize(first_deliveries_df[‘groupCode2’])For each sub_group, find corresponding prod_group index
subgroup_df = first_deliveries_df[[‘groupCode1’, ‘groupCode2’]].drop_duplicates()
subgroup_df[‘prod_group_idx’] = pd.factorize(subgroup_df[‘groupCode1’])[0]
subgroup_df[‘sub_group_idx’] = pd.factorize(subgroup_df[‘groupCode2’])[0]Create an array: subgroup_idx → prod_group_idx
mapping_array = subgroup_df.sort_values(‘sub_group_idx’)[‘prod_group_idx’].values
coords = {‘sub_groups’: sub_groups, ‘prod_groups’ : prod_groups}
with pm.Model(coords=coords) as lead_times_model:
# Hyper priors
mu = pm.Normal(“mu”, mu=100, sigma=30)
sigma = pm.HalfCauchy(“sigma”, beta=30)# Parameters for Groups mu_group = pm.Normal('mu_group', mu=mu, sigma=30, dims="prod_groups") sigma_group = pm.HalfCauchy('sigma_group', beta=30, dims="prod_groups") # Parameters for Subgroups mu_sub_group = pm.Normal('mu_sub_group', mu=mu_group[mapping_array], sigma=30, dims="sub_groups") sigma_sub_group = pm.HalfCauchy('sigma_sub_group', beta=30, dims="sub_groups") # Noise term sigma_obs = pm.HalfCauchy("sigma_obs", beta=30) mu_exp = pm.Deterministic('mu_exp', pm.math.exp(mu_sub_group), dims="sub_groups") # Likelihood lead_time_obs = pm.LogNormal( "lead_time_obs", mu=mu_sub_group[sub_group_idx], sigma=sigma_obs, observed=first_deliveries_df['actualLeadTimeDays'].values ) # Sampling idata_lt = pm.sample(return_inferencedata=True) idata_lt.extend(pm.sample_posterior_predictive(idata_lt))