Cannot plot ppc for specific sub_groups

I have this model which seems to perform fairly well, i’d like to plot_ppc for specific sub_groups though, which i can’t figure out how to do using az.plot_ppc. How do I create a ppc plot for each sub_group?

# Create mappings
subgroup_to_prodgroup = first_deliveries_df[['groupCode2', 'groupCode1']].drop_duplicates()

# Reindex using  factorize indices
_, sub_groups = pd.factorize(subgroup_to_prodgroup['groupCode2'])
_, prod_groups = pd.factorize(subgroup_to_prodgroup['groupCode1'])

# Create mapping: for each subgroup, which product group index
subgroup_to_prodgroup_idx = pd.Series(prod_groups).values

# Factorize the product groups and subgroups first
prod_group_idx, prod_groups = pd.factorize(first_deliveries_df['groupCode1'])
sub_group_idx, sub_groups = pd.factorize(first_deliveries_df['groupCode2'])

# For each sub_group, find corresponding prod_group index
subgroup_df = first_deliveries_df[['groupCode1', 'groupCode2']].drop_duplicates()
subgroup_df['prod_group_idx'] = pd.factorize(subgroup_df['groupCode1'])[0]
subgroup_df['sub_group_idx'] = pd.factorize(subgroup_df['groupCode2'])[0]

# Create an array: subgroup_idx -> prod_group_idx
mapping_array = subgroup_df.sort_values('sub_group_idx')['prod_group_idx'].values

coords = {'sub_groups': sub_groups, 'prod_groups' : prod_groups}

with pm.Model(coords=coords) as lead_times_model:
    # Hyper priors
    mu = pm.Normal("mu", mu=100, sigma=30)
    sigma = pm.HalfCauchy("sigma", beta=30)

    # Parameters for Groups
    mu_group = pm.Normal('mu_group', mu=mu, sigma=30, dims="prod_groups")
    sigma_group = pm.HalfCauchy('sigma_group', beta=30, dims="prod_groups")

    # Parameters for Subgroups
    mu_sub_group = pm.Normal('mu_sub_group', mu=mu_group[mapping_array], sigma=30, dims="sub_groups")
    sigma_sub_group = pm.HalfCauchy('sigma_sub_group', beta=30, dims="sub_groups")

    # Noise term
    sigma_obs = pm.HalfCauchy("sigma_obs", beta=30)

    mu_exp = pm.Deterministic('mu_exp', pm.math.exp(mu_sub_group), dims="sub_groups")

    # Likelihood
    lead_time_obs = pm.LogNormal(
        "lead_time_obs", 
        mu=mu_sub_group[sub_group_idx], 
        sigma=sigma_obs,
        observed=first_deliveries_df['actualLeadTimeDays'].values
    )

    # Sampling
    idata_lt = pm.sample(return_inferencedata=True)
    idata_lt.extend(pm.sample_posterior_predictive(idata_lt))

Look at the third-last and second-last charts here: arviz.plot_ppc — ArviZ dev documentation. You have to use the index from constant data, and flatten.

thanks @zweli

So this should work?

obs_sub_group = idata_lt.posterior["sub_groups"][idata_lt.constant_data["sub_group_idx"]]
idata_lt = idata_lt.assign_coords(obs_id=obs_sub_group, groups="observed_vars")
az.plot_ppc(idata_lt, coords={'obs_id': ['1107', '1110']}, flatten=[])

Cause i get the error

AttributeError: 'InferenceData' object has no attribute 'constant_data'

Check the differences between the example inference data file and yours. You’ll see the information about the indexes is stored there as part of the constant data group. Consequently, that code should work provided you register sub_group_idx as a pm.Data object (which will then be eventually stored in the constant_data group). See Using Data Containers — PyMC example gallery for more on Data containers, they have other benefits in addition to storing data as part of the inferencedata.

To not use Data containers, you should use the variable with the indexes directly instead: idata_lt.constant_data["sub_group_idx"]sub_group_idx