How do I assign coords to do a ppc for specific groups in my data?

Hello.

I’ve developed a sales forecasting model and would like to do some ppc (and other) analysis around certain items. I’ve tried to follow the following tutorial: arviz.plot_ppc — ArviZ 0.15.1 documentation

Specifically, this following code:

obs_county = data.posterior["County"][data.constant_data["county_idx"]]
data = data.assign_coords(obs_id=obs_county, groups="observed_vars")
az.plot_ppc(data, coords={'obs_id': ['ANOKA', 'BELTRAMI']}, flatten=[])

Since I cannot see the trace object, I’m not sure what I’m doing wrong. In the second line that seems to be assigning the coordinates, where is “observed_vars” coming from?

Here is what I tried:

obs_item = idata.posterior["item"][idata.constant_data["ic_to_item_map_dim_0"]]
data = idata.assign_coords(obs_id=obs_item, groups="observed_vars")
az.plot_ppc(data, coords={'predicted_eaches_dim_0': ['100189K', '103821K', '110656K', '111228K','118185V', '110403K', '109849K']}, flatten=[])

This is the error it throws:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_33465/1696874978.py in <module>
      1 obs_item = idata.posterior["item"][idata.constant_data["ic_to_item_map_dim_0"]]
      2 data = idata.assign_coords(obs_id=obs_item, groups="observed_vars")
----> 3 az.plot_ppc(data, coords={'predicted_eaches_dim_0': ['100189K', '103821K', '110656K', '111228K','118185V', '110403K', '109849K']}, flatten=[])

/opt/conda/lib/python3.7/site-packages/arviz/plots/ppcplot.py in plot_ppc(data, kind, alpha, mean, observed, color, colors, grid, figsize, textsize, data_pairs, var_names, filter_vars, coords, flatten, flatten_pp, num_pp_samples, random_seed, jitter, animated, animation_kwargs, legend, labeller, ax, backend, backend_kwargs, group, show)
    354     # TODO: Add backend kwargs
    355     plot = get_plotting_function("plot_ppc", "ppcplot", backend)
--> 356     axes = plot(**ppcplot_kwargs)
    357     return axes

/opt/conda/lib/python3.7/site-packages/arviz/plots/backends/matplotlib/ppcplot.py in plot_ppc(ax, length_plotters, rows, cols, figsize, animated, obs_plotters, pp_plotters, predictive_dataset, pp_sample_ix, kind, alpha, colors, textsize, mean, observed, jitter, total_pp_samples, legend, labeller, group, animation_kwargs, num_pp_samples, backend_kwargs, show)
     98     backend_kwargs.setdefault("squeeze", True)
     99     if ax is None:
--> 100         fig, axes = create_axes_grid(length_plotters, rows, cols, backend_kwargs=backend_kwargs)
    101     else:
    102         axes = np.ravel(ax)

/opt/conda/lib/python3.7/site-packages/arviz/plots/backends/matplotlib/__init__.py in create_axes_grid(length_plotters, rows, cols, backend_kwargs)
     53     backend_kwargs = {**backend_kwarg_defaults(), **backend_kwargs}
     54 
---> 55     fig, axes = subplots(rows, cols, **backend_kwargs)
     56     extra = (rows * cols) - length_plotters
     57     if extra > 0:

/opt/conda/lib/python3.7/site-packages/matplotlib/pyplot.py in subplots(nrows, ncols, sharex, sharey, squeeze, subplot_kw, gridspec_kw, **fig_kw)
   1454     axs = fig.subplots(nrows=nrows, ncols=ncols, sharex=sharex, sharey=sharey,
   1455                        squeeze=squeeze, subplot_kw=subplot_kw,
-> 1456                        gridspec_kw=gridspec_kw)
   1457     return fig, axs
   1458 

/opt/conda/lib/python3.7/site-packages/matplotlib/figure.py in subplots(self, nrows, ncols, sharex, sharey, squeeze, subplot_kw, gridspec_kw)
    894         if gridspec_kw is None:
    895             gridspec_kw = {}
--> 896         gs = self.add_gridspec(nrows, ncols, figure=self, **gridspec_kw)
    897         axs = gs.subplots(sharex=sharex, sharey=sharey, squeeze=squeeze,
    898                           subplot_kw=subplot_kw)

/opt/conda/lib/python3.7/site-packages/matplotlib/figure.py in add_gridspec(self, nrows, ncols, **kwargs)
   1445 
   1446         _ = kwargs.pop('figure', None)  # pop in case user has added this...
-> 1447         gs = GridSpec(nrows=nrows, ncols=ncols, figure=self, **kwargs)
   1448         self._gridspecs.append(gs)
   1449         return gs

/opt/conda/lib/python3.7/site-packages/matplotlib/gridspec.py in __init__(self, nrows, ncols, figure, left, bottom, right, top, wspace, hspace, width_ratios, height_ratios)
    385         super().__init__(nrows, ncols,
    386                          width_ratios=width_ratios,
--> 387                          height_ratios=height_ratios)
    388 
    389     _AllowedKeys = ["left", "bottom", "right", "top", "wspace", "hspace"]

/opt/conda/lib/python3.7/site-packages/matplotlib/gridspec.py in __init__(self, nrows, ncols, height_ratios, width_ratios)
     51         if not isinstance(ncols, Integral) or ncols <= 0:
     52             raise ValueError(
---> 53                 f"Number of columns must be a positive integer, not {ncols!r}")
     54         self._nrows, self._ncols = nrows, ncols
     55         self.set_height_ratios(height_ratios)

ValueError: Number of columns must be a positive integer, not 0

I am not sure I follow this. If this is about the data object, the code examples are all part of the same “session”, it is data = az.load_arviz_data('radon'), loaded a couple cells before that one.

Yes, the first line defines the new coordinate with the xarray label-based equivalent to numpy “fancy” indexing, the 2nd one updates the present coordinate values to these new ones (which are no longer a unique id and have repeated values). “observed_vars” is a shortcut for ["posterior_predictive", "observed_data", "prior_predictive"].

It looks like you are creating a new coordinate in the 2nd line here instead of updating an existing one. Thus, this new coordinate is not set as “index” to anything, and “predicted_eaches_dim_0” probably still has the default integer arange coordinates, so the coords you are giving return an empty subset → the plot attempts to generate a 0x0 grid which raises that error in matplotlib.

Thank you. I changed to the following:

obs_item = idata.posterior["item"][idata.constant_data["ic_to_item_map_dim_0"]]
data = idata.assign_coords(obs_id=obs_item, groups="observed_vars")
az.plot_ppc(data, coords={'ic_to_item_map_dim_0': ['110656K']}, flatten=[], kind='cumulative')

This produces 40 graphs for one item where I’m aiming for one graph per item. In the code above, I just pull one item.

Maybe this has to do with my model?

Here is my model for reference:

coords = {'business_line':business_lines,
              'category':categories,
              'subcategory':subcats,
              'ic':ic,
              'item':items,
              'time':times,
             'location':locations,
              'promo': promo,
              'giftset':giftset,
              'free_fin':free_fin,
              'dc_discount':dc_discount,
              # 'month':month,
              'cannibalization':cannibalization,
             # 'yearly_components':[f'yearly_{f}_{i+1}' for f in ['cos', 'sin'] for i in range(x_fourier.shape[1] // 2)],
             # 'weekly_components':[f'weekly_{f}_{i+1}' for f in ['cos', 'sin'] for i in range(x_fourier_week.shape[1] // 2)]
             }

    #TRAINING THE MODEL STARTS HERE
    print("fitting model...")
    with pm.Model(coords=coords) as constant_model:    
        #Data that does not change
        cat_to_bl_map = pm.Data('cat_to_bl_map', cat_to_bl_idx, mutable=False)
        subcat_to_cat_map = pm.Data('subcat_to_cat_map', subcat_to_cat_idx, mutable=False)
        ic_to_subcat_map = pm.Data('ic_to_subcat_map', ic_to_subcat_idx, mutable=False)
        ic_to_item_map = pm.Data('ic_to_item_map', ic_to_item_idx, mutable = False)

        #Data that does change
        pm_loc_idx = pm.Data('loc_idx', location_idx, mutable = True)
        pm_item_idx = pm.Data('item_idx', item_idx, mutable=True)
        pm_time_idx = pm.Data('time_idx', time_idx, mutable=True)
        observed_eaches = pm.Data('observed_eaches', df.residual, mutable=True)
        promo_ = pm.Data('promotion', promo_idx, mutable = True)
        cann_ = pm.Data('cannibalization', cann_idx, mutable = True)
        dc_discount_ = pm.Data('dc_discount', dc_idx, mutable = True)
        free_fin_ = pm.Data('free_fin', free_fin_idx, mutable = True)
        pvbv_ = pm.Data('pvbv', promo_pvbv_idx, mutable = True)
        giftset_ = pm.Data('giftset', giftset_idx, mutable = True)
        month_2_ = pm.Data('month_2', month_2_idx, mutable = True)
        month_3_ = pm.Data('month_3', month_3_idx ,mutable = True)
        month_4_ = pm.Data('month_4', month_4_idx ,mutable = True)
        month_5_ = pm.Data('month_5', month_5_idx ,mutable = True)
        month_6_ = pm.Data('month_6', month_6_idx ,mutable = True)
        month_7_ = pm.Data('month_7', month_7_idx ,mutable = True)
        month_8_ = pm.Data('month_8', month_8_idx ,mutable = True)
        month_9_ = pm.Data('month_9', month_9_idx ,mutable = True)
        month_10_ = pm.Data('month_10', month_10_idx,mutable = True)
        month_11_ = pm.Data('month_11', month_11_idx ,mutable = True)
        month_12_ = pm.Data('month_12', month_12_idx ,mutable = True)
        # dcost_local_change_ = pm.Data('dcost_local', price_idx, mutable = True)
        price_change_before_ = pm.Data('price_change_before', price_before_idx, mutable = True)
        price_change_on_ = pm.Data('price_change_on', price_before_idx, mutable = True)

        loc_intercept = pm.Normal('loc_intercept', mu = 0, sigma = 1, dims = ['location'])
        loc_bl = utility_functions.make_next_level_hierarchy_variable(name='loc_bl', mu=loc_intercept, alpha=2, beta=1, dims=['business_line', 'location'])
        loc_cat = utility_functions.make_next_level_hierarchy_variable(name='loc_cat', mu=loc_bl[cat_to_bl_map], alpha=2, beta=1, dims=['category', 'location'])
        loc_subcat = utility_functions.make_next_level_hierarchy_variable(name='loc_subcat', mu=loc_cat[subcat_to_cat_map], alpha=2, beta=1, dims=['subcategory', 'location'])
        loc_ic = utility_functions.make_next_level_hierarchy_variable(name='loc_ic', mu=loc_subcat[ic_to_subcat_map], alpha=2, beta=1, dims=['ic', 'location'])
        loc_item = utility_functions.make_next_level_hierarchy_variable(name='loc_item', mu=loc_ic[ic_to_item_map], alpha=2, beta=1, dims=['item', 'location'])

        mu_cann = pm.Normal('mu_cann', mu = 0, sigma = 1)
        bl_cann = utility_functions.make_next_level_hierarchy_variable(name='bl_cann', mu=mu_cann, alpha=2, beta=1, dims=['business_line'])
        cat_cann = utility_functions.make_next_level_hierarchy_variable(name='cat_cann', mu=bl_cann[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_cann = utility_functions.make_next_level_hierarchy_variable(name='subcat_cann', mu=cat_cann[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_cann = utility_functions.make_next_level_hierarchy_variable(name='ic_cann', mu=subcat_cann[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_cann = utility_functions.make_next_level_hierarchy_variable(name='item_cann', mu=ic_cann[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu_dc_discount = pm.Normal('mu_dc_discount', mu = 0, sigma = 1)
        bl_dc_discount = utility_functions.make_next_level_hierarchy_variable(name='bl_dc_discount', mu=mu_dc_discount, alpha=2, beta=1, dims=['business_line'])
        cat_dc_discount = utility_functions.make_next_level_hierarchy_variable(name='cat_dc_discount', mu=bl_dc_discount[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_dc_discount = utility_functions.make_next_level_hierarchy_variable(name='subcat_dc_discount', mu=cat_dc_discount[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_dc_discount = utility_functions.make_next_level_hierarchy_variable(name='ic_dc_discount', mu=subcat_dc_discount[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_dc_discount = utility_functions.make_next_level_hierarchy_variable(name='item_dc_discount', mu=ic_dc_discount[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu_free_fin = pm.Normal('mu_free_fin', mu = 0, sigma = 1)
        bl_free_fin = utility_functions.make_next_level_hierarchy_variable(name='bl_free_fin', mu=mu_free_fin, alpha=2, beta=1, dims=['business_line'])
        cat_free_fin = utility_functions.make_next_level_hierarchy_variable(name='cat_free_fin', mu=bl_free_fin[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_free_fin = utility_functions.make_next_level_hierarchy_variable(name='subcat_free_fin', mu=cat_free_fin[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_free_fin = utility_functions.make_next_level_hierarchy_variable(name='ic_free_fin', mu=subcat_free_fin[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_free_fin = utility_functions.make_next_level_hierarchy_variable(name='item_free_fin', mu=ic_free_fin[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu_pvbv = pm.Normal('mu_pvbv', mu = 0, sigma = 1)
        bl_pvbv = utility_functions.make_next_level_hierarchy_variable(name='bl_pvbv', mu=mu_pvbv, alpha=2, beta=1, dims=['business_line'])
        cat_pvbv = utility_functions.make_next_level_hierarchy_variable(name='cat_pvbv', mu=bl_pvbv[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_pvbv = utility_functions.make_next_level_hierarchy_variable(name='subcat_pvbv', mu=cat_pvbv[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_pvbv = utility_functions.make_next_level_hierarchy_variable(name='ic_pvbv', mu=subcat_pvbv[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_pvbv = utility_functions.make_next_level_hierarchy_variable(name='item_pvbv', mu=ic_pvbv[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu_giftset = pm.Normal('mu_giftset', mu = 0, sigma = 1)
        bl_giftset = utility_functions.make_next_level_hierarchy_variable(name='bl_giftset', mu=mu_giftset, alpha=2, beta=1, dims=['business_line'])
        cat_giftset = utility_functions.make_next_level_hierarchy_variable(name='cat_giftset', mu=bl_giftset[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_giftset = utility_functions.make_next_level_hierarchy_variable(name='subcat_giftset', mu=cat_giftset[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_giftset = utility_functions.make_next_level_hierarchy_variable(name='ic_giftset', mu=subcat_giftset[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_giftset = utility_functions.make_next_level_hierarchy_variable(name='item_giftset', mu=ic_giftset[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu_month2 = pm.Normal('mu_month2', mu = 0, sigma = 1)
        bl_month2 = utility_functions.make_next_level_hierarchy_variable(name='bl_month2', mu=mu_month2, alpha=2, beta=1, dims=['business_line'])
        cat_month2 = utility_functions.make_next_level_hierarchy_variable(name='cat_month2', mu=bl_month2[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_month2 = utility_functions.make_next_level_hierarchy_variable(name='subcat_month2', mu=cat_month2[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_month2 = utility_functions.make_next_level_hierarchy_variable(name='ic_month2', mu=subcat_month2[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_month2 = utility_functions.make_next_level_hierarchy_variable(name='item_month2', mu=ic_month2[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu_month3 = pm.Normal('mu_month3', mu = 0, sigma = 1)
        bl_month3 = utility_functions.make_next_level_hierarchy_variable(name='bl_month3', mu=mu_month3, alpha=2, beta=1, dims=['business_line'])
        cat_month3 = utility_functions.make_next_level_hierarchy_variable(name='cat_month3', mu=bl_month3[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_month3 = utility_functions.make_next_level_hierarchy_variable(name='subcat_month3', mu=cat_month3[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_month3 = utility_functions.make_next_level_hierarchy_variable(name='ic_month3', mu=subcat_month3[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_month3 = utility_functions.make_next_level_hierarchy_variable(name='item_month3', mu=ic_month3[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu_month4 = pm.Normal('mu_month4', mu = 0, sigma = 1)
        bl_month4 = utility_functions.make_next_level_hierarchy_variable(name='bl_month4', mu=mu_month4, alpha=2, beta=1, dims=['business_line'])
        cat_month4 = utility_functions.make_next_level_hierarchy_variable(name='cat_month4', mu=bl_month4[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_month4 = utility_functions.make_next_level_hierarchy_variable(name='subcat_month4', mu=cat_month4[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_month4 = utility_functions.make_next_level_hierarchy_variable(name='ic_month4', mu=subcat_month4[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_month4 = utility_functions.make_next_level_hierarchy_variable(name='item_month4', mu=ic_month4[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu_month5 = pm.Normal('mu_month5', mu = 0, sigma = 1)
        bl_month5 = utility_functions.make_next_level_hierarchy_variable(name='bl_month5', mu=mu_month5, alpha=2, beta=1, dims=['business_line'])
        cat_month5 = utility_functions.make_next_level_hierarchy_variable(name='cat_month5', mu=bl_month5[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_month5 = utility_functions.make_next_level_hierarchy_variable(name='subcat_month5', mu=cat_month5[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_month5 = utility_functions.make_next_level_hierarchy_variable(name='ic_month5', mu=subcat_month5[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_month5 = utility_functions.make_next_level_hierarchy_variable(name='item_month5', mu=ic_month5[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu_month6 = pm.Normal('mu_month6', mu = 0, sigma = 1)
        bl_month6 = utility_functions.make_next_level_hierarchy_variable(name='bl_month6', mu=mu_month6, alpha=2, beta=1, dims=['business_line'])
        cat_month6 = utility_functions.make_next_level_hierarchy_variable(name='cat_month6', mu=bl_month6[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_month6 = utility_functions.make_next_level_hierarchy_variable(name='subcat_month6', mu=cat_month6[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_month6 = utility_functions.make_next_level_hierarchy_variable(name='ic_month6', mu=subcat_month6[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_month6 = utility_functions.make_next_level_hierarchy_variable(name='item_month6', mu=ic_month6[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu_month7 = pm.Normal('mu_month7', mu = 0, sigma = 1)
        bl_month7 = utility_functions.make_next_level_hierarchy_variable(name='bl_month7', mu=mu_month7, alpha=2, beta=1, dims=['business_line'])
        cat_month7 = utility_functions.make_next_level_hierarchy_variable(name='cat_month7', mu=bl_month7[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_month7 = utility_functions.make_next_level_hierarchy_variable(name='subcat_month7', mu=cat_month7[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_month7 = utility_functions.make_next_level_hierarchy_variable(name='ic_month7', mu=subcat_month7[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_month7 = utility_functions.make_next_level_hierarchy_variable(name='item_month7', mu=ic_month7[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu_month8 = pm.Normal('mu_month8', mu = 0, sigma = 1)
        bl_month8 = utility_functions.make_next_level_hierarchy_variable(name='bl_month8', mu=mu_month8, alpha=2, beta=1, dims=['business_line'])
        cat_month8 = utility_functions.make_next_level_hierarchy_variable(name='cat_month8', mu=bl_month8[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_month8 = utility_functions.make_next_level_hierarchy_variable(name='subcat_month8', mu=cat_month8[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_month8 = utility_functions.make_next_level_hierarchy_variable(name='ic_month8', mu=subcat_month8[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_month8 = utility_functions.make_next_level_hierarchy_variable(name='item_month8', mu=ic_month8[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu_month9 = pm.Normal('mu_month9', mu = 0, sigma = 5)
        bl_month9 = utility_functions.make_next_level_hierarchy_variable(name='bl_month9', mu=mu_month9, alpha=2, beta=1, dims=['business_line'])
        cat_month9 = utility_functions.make_next_level_hierarchy_variable(name='cat_month9', mu=bl_month9[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_month9 = utility_functions.make_next_level_hierarchy_variable(name='subcat_month9', mu=cat_month9[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_month9 = utility_functions.make_next_level_hierarchy_variable(name='ic_month9', mu=subcat_month9[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_month9 = utility_functions.make_next_level_hierarchy_variable(name='item_month9', mu=ic_month9[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu_month10 = pm.Normal('mu_month10', mu = 0, sigma = 1)
        bl_month10 = utility_functions.make_next_level_hierarchy_variable(name='bl_month10', mu=mu_month10, alpha=2, beta=1, dims=['business_line'])
        cat_month10 = utility_functions.make_next_level_hierarchy_variable(name='cat_month10', mu=bl_month10[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_month10 = utility_functions.make_next_level_hierarchy_variable(name='subcat_month10', mu=cat_month10[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_month10 = utility_functions.make_next_level_hierarchy_variable(name='ic_month10', mu=subcat_month10[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_month10 = utility_functions.make_next_level_hierarchy_variable(name='item_month10', mu=ic_month10[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu_month11 = pm.Normal('mu_month11', mu = 0, sigma = 1)
        bl_month11 = utility_functions.make_next_level_hierarchy_variable(name='bl_month11', mu=mu_month11, alpha=2, beta=1, dims=['business_line'])
        cat_month11 = utility_functions.make_next_level_hierarchy_variable(name='cat_month11', mu=bl_month11[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_month11 = utility_functions.make_next_level_hierarchy_variable(name='subcat_month11', mu=cat_month11[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_month11 = utility_functions.make_next_level_hierarchy_variable(name='ic_month11', mu=subcat_month11[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_month11 = utility_functions.make_next_level_hierarchy_variable(name='item_month11', mu=ic_month11[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu_month12 = pm.Normal('mu_month12', mu = 0, sigma = 1)
        bl_month12 = utility_functions.make_next_level_hierarchy_variable(name='bl_month12', mu=mu_month12, alpha=2, beta=1, dims=['business_line'])
        cat_month12 = utility_functions.make_next_level_hierarchy_variable(name='cat_month12', mu=bl_month12[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_month12 = utility_functions.make_next_level_hierarchy_variable(name='subcat_month12', mu=cat_month12[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_month12 = utility_functions.make_next_level_hierarchy_variable(name='ic_month12', mu=subcat_month12[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_month12 = utility_functions.make_next_level_hierarchy_variable(name='item_month12', mu=ic_month12[ic_to_item_map], alpha=2, beta=1,  dims=['item'])
        
        mu_price_change = pm.Normal('mu_price_change', mu = 0, sigma = 1)
        bl_price_change = utility_functions.make_next_level_hierarchy_variable(name='bl_price_change', mu=mu_price_change, alpha=2, beta=1, dims=['business_line'])
        cat_price_change = utility_functions.make_next_level_hierarchy_variable(name='cat_price_change', mu=bl_price_change[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_price_change = utility_functions.make_next_level_hierarchy_variable(name='subcat_price_change', mu=cat_price_change[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_price_change = utility_functions.make_next_level_hierarchy_variable(name='ic_price_change', mu=subcat_price_change[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_price_change = utility_functions.make_next_level_hierarchy_variable(name='item_price_change', mu=ic_price_change[ic_to_item_map], alpha=2, beta=1,  dims=['item'])
        
        mu_price_change_after = pm.Normal('mu_price_change_after', mu = 0, sigma = 1)
        bl_price_change_after = utility_functions.make_next_level_hierarchy_variable(name='bl_price_change_after', mu=mu_price_change_after, alpha=2, beta=1, dims=['business_line'])
        cat_price_change_after = utility_functions.make_next_level_hierarchy_variable(name='cat_price_change_after', mu=bl_price_change_after[cat_to_bl_map], alpha=2, beta=1,  dims=['category'])
        subcat_price_change_after = utility_functions.make_next_level_hierarchy_variable(name='subcat_price_change_after', mu=cat_price_change_after[subcat_to_cat_map],  alpha=2, beta=1, dims=['subcategory'])
        ic_price_change_after = utility_functions.make_next_level_hierarchy_variable(name='ic_price_change_after', mu=subcat_price_change_after[ic_to_subcat_map],  alpha=2, beta=1, dims=['ic'])
        item_price_change_after = utility_functions.make_next_level_hierarchy_variable(name='item_price_change_after', mu=ic_price_change_after[ic_to_item_map], alpha=2, beta=1,  dims=['item'])

        mu = (loc_item[pm_item_idx, pm_loc_idx] + item_cann[pm_item_idx]*cann_  + item_dc_discount[pm_item_idx]*dc_discount_
             + item_free_fin[pm_item_idx]*free_fin_ + item_pvbv[pm_item_idx]*pvbv_ + item_giftset[pm_item_idx]*giftset_ + item_month2[pm_item_idx]*month_2_
             + item_month2[pm_item_idx]*month_3_ + item_month2[pm_item_idx]*month_4_ + item_month2[pm_item_idx]*month_5_ + item_month2[pm_item_idx]*month_6_
              + item_month2[pm_item_idx]*month_7_ + item_month2[pm_item_idx]*month_8_ + item_month2[pm_item_idx]*month_9_ + item_month2[pm_item_idx]*month_10_
              + item_month2[pm_item_idx]*month_11_ + item_month2[pm_item_idx]*month_12_ + item_price_change[pm_item_idx]*price_change_before_
              + item_price_change_after[pm_item_idx]*price_change_on_)

        sigma = pm.HalfNormal('sigma', sigma=100)

        eaches = pm.StudentT('predicted_eaches',
                             mu=mu,
                             sigma=sigma,
                             nu=15,
                             # lower = 0,
                             observed=observed_eaches)

I’m not sure what is causing so many graphs.

I don’t know what obs_item is supposed to be, but note that right now you are then assigning that to obs_id which is not a coordinate in your posterior predictive it looks like. You want to update the existing coordinates (in the example case an integer id) with something else that then has repeated values for the groupbys or things like that.

I’ll try to extend the example in the docs to hopefully make it more clear, I am afraid I can’t work with your exact model both because it is not reproducible and because it would take too much time to try and understand all the terms.

I will use the centered_eight dataset. You can load it with:

import arviz as az
idata = az.load_arviz_data("centered_eight")

The groups we care about for plot_ppc are posterior_preditive and observed_data. Here is how they look for this dataset. The posterior predictive is:

<xarray.Dataset>
Dimensions:  (chain: 4, draw: 500, school: 8)
Coordinates:
  * chain    (chain) int64 0 1 2 3
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 ... 492 493 494 495 496 497 498 499
  * school   (school) object 'Choate' 'Deerfield' ... "St. Paul's" 'Mt. Hermon'
Data variables:
    obs      (chain, draw, school) float64 ...

and the observed_data is:

<xarray.Dataset>
Dimensions:  (school: 8)
Coordinates:
  * school   (school) object 'Choate' 'Deerfield' ... "St. Paul's" 'Mt. Hermon'
Data variables:
    obs      (school) float64 ...

The dimensions we can “play with” are the ones that are present in both groups, aka the dimensions in observed_data, also the dimensions in posterior predictive minus “chain” and “draw” ones. In this case it is school only. The default behaviour is to flatten all these dimensions in the plot, thus generating a single plot:

az.plot_ppc(idata)

imatge

Consequently, explicitly indicating we want to flatten the school dimension generates the same plot; the code below is equivalent:

az.plot_ppc(idata, flatten=["school"])

IMPORTANT: even if the default is to flatten all the dimensions there might be multiple plots generated, for example if there are multiple observed variables. Try loading the "rugby" dataset with load_arviz_data instead and see what the the default plot_ppc.


As we can see in the printouts above, the school dimension has length 8. If we use flatten=[] to indicate we want no dimension to be flattened, we will generate 8 plots. Each plot in that case will have only a single observation assigned to them, so we can not generate a kde for the observations nor predictions. Trying to do

az.plot_ppc(idata, flatten=[]);

generates 8 subplots, all labeled but empty. What we can do is plotting the actual data as multiple rug plots for example. We can do that with the kind argument:

az.plot_ppc(idata, flatten=[], kind='scatter')

The bottom dot is the observation for each of the 8 schools, the dots above are predictions for that observation for 5 randomly selected posterior draws, and the orange line is the kde of the predictions for that observation for all posterior draws, so even if the prediction is a scalar, as they vary between draws we can generate a kde.

Now we get to modifying the coordinate values. What would happen if instead of having 8 different schools as coordinate values (more abstractly, unique coord values) we had non-unique coord values? Then ArviZ will group observations by coord value and generate a plot for each unique value.
We can use assign coords method to modify the coordinate values of the school dimension:

idata = idata.assign_coords(
    school=["a", "b", "b", "a", "b", "b", "a", "b"],
    groups=["observed_data", "posterior_predictive"]
)

we are modifying both observed data and posterior predictive groups at the same time. Here is how the posteror predictive looks like now:

<xarray.Dataset>
Dimensions:  (chain: 4, draw: 500, school: 8)
Coordinates:
  * chain    (chain) int64 0 1 2 3
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 ... 492 493 494 495 496 497 498 499
  * school   (school) <U1 'a' 'b' 'b' 'a' 'b' 'b' 'a' 'b'
Data variables:
    obs      (chain, draw, school) float64 ...

Same dimensions, same data, but different coordinate values for the school dimension. Now, if we do:

az.plot_ppc(idata, flatten=[]);

and we can combine that with coords to plot only a subset of the unique values. For example, to plot only the observations that correspond to the a coord value we can do:

az.plot_ppc(idata, flatten=[], coords={"school": "a"});

imatge


It is also important to note that coords can be anything. I could also do

idata = idata.assign_coords(
    obs_id=[3, 4, 2, 4, 5, 5],  # note also different length
    groups=["observed_data", "posterior_predictive"]
)

now, the posterior predictive looks like:

<xarray.Dataset>
Dimensions:  (chain: 4, draw: 500, school: 8, obs_id: 6)
Coordinates:
  * chain    (chain) int64 0 1 2 3
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 ... 492 493 494 495 496 497 498 499
  * school   (school) <U1 'a' 'b' 'b' 'a' 'b' 'b' 'a' 'b'
  * obs_id   (obs_id) int64 3 4 2 4 5 5
Data variables:
    obs      (chain, draw, school) float64 ...

there is an obs_id dimension with some coordinate values, but _it is not a dimension in our variable (see last line of exerpt above, the dimensions of obs are chain, draw and school). So it is not indexing anything and it is completely unrelated from our data, so it can’t be used for facetting in plot_ppc.

In your specific case, you need to:

  • identify the dimension whose coordinate values you want to update, maybe ic_to_item_map_dim_0 which is what you are using in coords for the plot?
  • define the new coordinate values, be it with the obs_item line like in the docs example or manually like I have done here. This might already work in your code example, double check it is a 1d array/dataarray with the same length as the dimension you identified above.
  • call plot_ppc with flatten=[] to trigger the groupby intead of flattening all dimensions irrespective of their coordinate values.
4 Likes

Thank you. This is very helpful. I have gotten a lot further after this detailed post. The only thing that is not working is that labels are not showing up on the graph.

Here is a recap of what I did.

  1. Assign item number to the coordinates of both observed data and posterior predictive.
obs_item = idata_buyer.posterior["item"][idata_buyer.constant_data["ic_to_item_map_dim_0"]]
data = idata_buyer.assign_coords(predicted_eaches_dim_0=obs_item, groups=["observed_vars", 'posterior_predictive'])

This yields the following:

  1. Run a scatter ppc plot with two items
az.plot_ppc(data, flatten=[], coords={'predicted_eaches_dim_0':['100179K', '100186K']}, kind = 'scatter')

That yields the following:

As shown, the plots use labels “0” and “1” instead of "100179K’ and “100186K” as I stated in the coords. I’m not sure why.

Also, when I try to run one item as a standard ppc…it returns a blank graph.

You mentioned above that the graphs would be blank until you changed the kind to ‘scatter’. I’m trying to mimic what you after changing the school coordinates to “a” and “b” and plottled a normal ppc only showing school “a”. I’m not sure why mine will not mimic yours.

As you can see I’m further than I was, and progress is the goal! Thank you thus far.

This is awesome. I was not sure if there was a more efficient way to do this before now. I was just running az.plot_ppc and then manually visualizing groups using standard xarray indexing. I should have known that there is already built-in interplay.

As I’m going through this, I noticed something different from my resulting coordinate assignment compared to yours.

When you reassign “schools” in your example, the original schools names are replaced.

In mine, I have somehow, added coordinates instead of just replacing them.

This is the original coordinates for posterior predictive:

As you can see above, there is only “predicitve_eaches_dim_0” (in addition to chain and draw).

I want to use item seen below from my posterior to reassign to the posterior_predictive:

When I run the below code, it seems to just add extra coordinates to the posterior_predictive:

obs_item = idata_buyer.posterior["item"][idata_buyer.constant_data["ic_to_item_map_dim_0"]]
data = idata_buyer.assign_coords(predicted_eaches_dim_0=obs_item, groups=["observed_vars", 'posterior_predictive'])

Maybe that’s why labels aren’t show up?

What happens if you use

data = idata_buyer.assign_coords(predicted_eaches_dim_0=obs_item.values, groups=["observed_vars", 'posterior_predictive'])

It seems the values you are assigning to the existing coord have different dimensions, which is a hit incoherent. With values you’ll assign a numpy array to the existing coord, so no dim info is available and it will be assumed to be the same. In general I recommend very strongly to label the dimensions.

It looks like I have two coordinates, both labeled predictive_eaches_dim_0.

When I try to run a grouped ppc on this using the below, I get the following error:

az.plot_ppc(data,
            flatten = [],
            coords={'predicted_eaches_dim_0': '100179K'})

Error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_38045/3269884987.py in <module>
      1 az.plot_ppc(data,
      2             flatten = [],
----> 3             coords={'predicted_eaches_dim_0': '100179K'})

/opt/conda/lib/python3.7/site-packages/arviz/plots/ppcplot.py in plot_ppc(data, kind, alpha, mean, observed, color, colors, grid, figsize, textsize, data_pairs, var_names, filter_vars, coords, flatten, flatten_pp, num_pp_samples, random_seed, jitter, animated, animation_kwargs, legend, labeller, ax, backend, backend_kwargs, group, show)
    354     # TODO: Add backend kwargs
    355     plot = get_plotting_function("plot_ppc", "ppcplot", backend)
--> 356     axes = plot(**ppcplot_kwargs)
    357     return axes

/opt/conda/lib/python3.7/site-packages/arviz/plots/backends/matplotlib/ppcplot.py in plot_ppc(ax, length_plotters, rows, cols, figsize, animated, obs_plotters, pp_plotters, predictive_dataset, pp_sample_ix, kind, alpha, colors, textsize, mean, observed, jitter, total_pp_samples, legend, labeller, group, animation_kwargs, num_pp_samples, backend_kwargs, show)
     98     backend_kwargs.setdefault("squeeze", True)
     99     if ax is None:
--> 100         fig, axes = create_axes_grid(length_plotters, rows, cols, backend_kwargs=backend_kwargs)
    101     else:
    102         axes = np.ravel(ax)

/opt/conda/lib/python3.7/site-packages/arviz/plots/backends/matplotlib/__init__.py in create_axes_grid(length_plotters, rows, cols, backend_kwargs)
     53     backend_kwargs = {**backend_kwarg_defaults(), **backend_kwargs}
     54 
---> 55     fig, axes = subplots(rows, cols, **backend_kwargs)
     56     extra = (rows * cols) - length_plotters
     57     if extra > 0:

/opt/conda/lib/python3.7/site-packages/matplotlib/pyplot.py in subplots(nrows, ncols, sharex, sharey, squeeze, subplot_kw, gridspec_kw, **fig_kw)
   1454     axs = fig.subplots(nrows=nrows, ncols=ncols, sharex=sharex, sharey=sharey,
   1455                        squeeze=squeeze, subplot_kw=subplot_kw,
-> 1456                        gridspec_kw=gridspec_kw)
   1457     return fig, axs
   1458 

/opt/conda/lib/python3.7/site-packages/matplotlib/figure.py in subplots(self, nrows, ncols, sharex, sharey, squeeze, subplot_kw, gridspec_kw)
    894         if gridspec_kw is None:
    895             gridspec_kw = {}
--> 896         gs = self.add_gridspec(nrows, ncols, figure=self, **gridspec_kw)
    897         axs = gs.subplots(sharex=sharex, sharey=sharey, squeeze=squeeze,
    898                           subplot_kw=subplot_kw)

/opt/conda/lib/python3.7/site-packages/matplotlib/figure.py in add_gridspec(self, nrows, ncols, **kwargs)
   1445 
   1446         _ = kwargs.pop('figure', None)  # pop in case user has added this...
-> 1447         gs = GridSpec(nrows=nrows, ncols=ncols, figure=self, **kwargs)
   1448         self._gridspecs.append(gs)
   1449         return gs

/opt/conda/lib/python3.7/site-packages/matplotlib/gridspec.py in __init__(self, nrows, ncols, figure, left, bottom, right, top, wspace, hspace, width_ratios, height_ratios)
    385         super().__init__(nrows, ncols,
    386                          width_ratios=width_ratios,
--> 387                          height_ratios=height_ratios)
    388 
    389     _AllowedKeys = ["left", "bottom", "right", "top", "wspace", "hspace"]

/opt/conda/lib/python3.7/site-packages/matplotlib/gridspec.py in __init__(self, nrows, ncols, height_ratios, width_ratios)
     51         if not isinstance(ncols, Integral) or ncols <= 0:
     52             raise ValueError(
---> 53                 f"Number of columns must be a positive integer, not {ncols!r}")
     54         self._nrows, self._ncols = nrows, ncols
     55         self.set_height_ratios(height_ratios)

ValueError: Number of columns must be a positive integer, not 0
<Figure size 0x460 with 0 Axes>

One says predictive, the other predicted

Well that’s embarassing. When I use the right coordinate, I get the following:

az.plot_ppc(data,
            flatten = [],
            coords={'predictive_eaches_dim_0': ['100179K']}, kind = 'scatter')

It doesn’t flatten beyond each observation for ‘100179K’. It seems to produce a graph for each observation. I’d like to see one graph for that item to include all the observations.

Should I run the model adding dims to the predicted_eaches?

Currently, this looks like the following:

eaches = pm.StudentT('predicted_eaches',
                             mu=mu,
                             sigma=sigma,
                             nu=15,
                             observed=observed_eaches)

Ok. @jessegrabowski helped me out with this post: PyMC+ArviZ: how to make the most of labeled coords and dims in PyMC 4.0 - Sharing - PyMC Discourse

I do have a multi-index with my hierarchical model that’s six levels deep.

I did the following to my arviz object:

idata.posterior_predictive.coords.update({'predicted_eaches_dim_0':m_idx})
idata.observed_data.coords.update({'predicted_eaches_dim_0':m_idx})
idata = idata.assign_coords(
    obs_id=m_idx,  # note also different length
    groups=["constant_data"]
)

That yielded the following:

So now I have repeating values for each level of my hierarchy plus the month and location of the sale made. When I run the below, I get the following.

az.plot_ppc(idata, flatten=[], coords = {'predicted_eaches_dim_0':'NUTRITION'},
           kind = 'scatter')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_85434/3645245510.py in <module>
      1 az.plot_ppc(idata, flatten=[], coords = {'predicted_eaches_dim_0':'NUTRITION'},
----> 2            kind = 'scatter')

/opt/conda/lib/python3.7/site-packages/arviz/plots/ppcplot.py in plot_ppc(data, kind, alpha, mean, observed, color, colors, grid, figsize, textsize, data_pairs, var_names, filter_vars, coords, flatten, flatten_pp, num_pp_samples, random_seed, jitter, animated, animation_kwargs, legend, labeller, ax, backend, backend_kwargs, group, show)
    354     # TODO: Add backend kwargs
    355     plot = get_plotting_function("plot_ppc", "ppcplot", backend)
--> 356     axes = plot(**ppcplot_kwargs)
    357     return axes

/opt/conda/lib/python3.7/site-packages/arviz/plots/backends/matplotlib/ppcplot.py in plot_ppc(ax, length_plotters, rows, cols, figsize, animated, obs_plotters, pp_plotters, predictive_dataset, pp_sample_ix, kind, alpha, colors, textsize, mean, observed, jitter, total_pp_samples, legend, labeller, group, animation_kwargs, num_pp_samples, backend_kwargs, show)
     98     backend_kwargs.setdefault("squeeze", True)
     99     if ax is None:
--> 100         fig, axes = create_axes_grid(length_plotters, rows, cols, backend_kwargs=backend_kwargs)
    101     else:
    102         axes = np.ravel(ax)

/opt/conda/lib/python3.7/site-packages/arviz/plots/backends/matplotlib/__init__.py in create_axes_grid(length_plotters, rows, cols, backend_kwargs)
     53     backend_kwargs = {**backend_kwarg_defaults(), **backend_kwargs}
     54 
---> 55     fig, axes = subplots(rows, cols, **backend_kwargs)
     56     extra = (rows * cols) - length_plotters
     57     if extra > 0:

/opt/conda/lib/python3.7/site-packages/matplotlib/pyplot.py in subplots(nrows, ncols, sharex, sharey, squeeze, subplot_kw, gridspec_kw, **fig_kw)
   1454     axs = fig.subplots(nrows=nrows, ncols=ncols, sharex=sharex, sharey=sharey,
   1455                        squeeze=squeeze, subplot_kw=subplot_kw,
-> 1456                        gridspec_kw=gridspec_kw)
   1457     return fig, axs
   1458 

/opt/conda/lib/python3.7/site-packages/matplotlib/figure.py in subplots(self, nrows, ncols, sharex, sharey, squeeze, subplot_kw, gridspec_kw)
    894         if gridspec_kw is None:
    895             gridspec_kw = {}
--> 896         gs = self.add_gridspec(nrows, ncols, figure=self, **gridspec_kw)
    897         axs = gs.subplots(sharex=sharex, sharey=sharey, squeeze=squeeze,
    898                           subplot_kw=subplot_kw)

/opt/conda/lib/python3.7/site-packages/matplotlib/figure.py in add_gridspec(self, nrows, ncols, **kwargs)
   1445 
   1446         _ = kwargs.pop('figure', None)  # pop in case user has added this...
-> 1447         gs = GridSpec(nrows=nrows, ncols=ncols, figure=self, **kwargs)
   1448         self._gridspecs.append(gs)
   1449         return gs

/opt/conda/lib/python3.7/site-packages/matplotlib/gridspec.py in __init__(self, nrows, ncols, figure, left, bottom, right, top, wspace, hspace, width_ratios, height_ratios)
    385         super().__init__(nrows, ncols,
    386                          width_ratios=width_ratios,
--> 387                          height_ratios=height_ratios)
    388 
    389     _AllowedKeys = ["left", "bottom", "right", "top", "wspace", "hspace"]

/opt/conda/lib/python3.7/site-packages/matplotlib/gridspec.py in __init__(self, nrows, ncols, height_ratios, width_ratios)
     51         if not isinstance(ncols, Integral) or ncols <= 0:
     52             raise ValueError(
---> 53                 f"Number of columns must be a positive integer, not {ncols!r}")
     54         self._nrows, self._ncols = nrows, ncols
     55         self.set_height_ratios(height_ratios)

ValueError: Number of columns must be a positive integer, not 0

NUTRITION is in the glbl_bus_ln_desc coordinate that is near the top of the arviz object but when I put that in the coordinate argument, the error states I only can use predicted_eaches_dim_0.

It looks as if I have everything I want in the object, but accessing it continues to be an issue.

Do you see what I’m doing wrong?

sorry I missed the notification for your answer. MultiIndexes aren’t supported yet in plot_ppc. You can use them in coords but they won’t have the clever auto-groupby happen for them.

As for coords, you are using predicted_eaches_dim_0 but that is the dimension and the coordinate name for the multiindex. The coordinate that has as coordinate values NUTRITION is glbl_bus_ln_desc.

This blog post might be helpful in understanding the differences between dimensions, coordinates and indexes:

Thanks for following up. I learned to just take the actual values in my dataframe series and change to coords to them. When I want to look at multiple different views, I just keep changing the coords. Real easy now. I appreciate the help and the link.

1 Like