# How to keep dimensions in a posterior predictive sample?

Hi,
I’m just learning these techniques, and trying to understand how to use dimensions to deal with categorical data.

In my example, I’ve got a table of ‘Event Counts’ over a period of weeks, split up by ‘Event Type’. So week 1 might have 4 Blue events and 2 Green ones. I’ve set up this data in ‘long’ format, so the table has columns `Week`,`Event Type`, `Event Count`.

Event counts are Poisson distributed, so my model estimates a `mu` for each Event Type.

Using dimensions, I can estimate a posterior with dimensions `(chain: 4, draw: 1000, Event Types: 2)`

I’d like to be able to create posterior predictive samples for each of the different dimensions. But when I try `pm.sample_posterior_predictive(trace)`, I end with inferencedata with only dimensions:
`(chain: 4, draw: 1000, obs_id: 2000)`

I want to get samples from each of the ‘Event Types’ I’ve estimated parameters for (and plot them with az.plot_ppc), but I can’t figure out how. Thank you so much for advice!

Here is a whole example:

``````n = 1000
ns = np.arange(0,n)

blue_lam = 1.4
green_lam = 3

blue_dist = poisson(blue_lam)
green_dist = poisson(green_lam)
blues = blue_dist.rvs(n)
greens = green_dist.rvs(n)

dt = pd.DataFrame.from_dict({'blue': blues, 'green':greens})
dt = dt.stack().reset_index()
dt.columns = ["sample number", "event type", "Event Count"]

type_idx, types = pd.factorize(dt['event type'])
counts = dt['Event Count']
sample_idx, samples = pd.factorize(dt['sample number'])

coords = {
"Event Types": types,
"Samples": samples,
"obs_id": np.arange(dt.shape[0])
}

mdl = pm.Model(coords=coords)
with mdl:
type_idx = pm.ConstantData("type_idx", type_idx, dims=("obs_id"))
#sample_idx = pm.ConstantData("sample_idx", sample_idx, dims=("obs_id"))
mu = pm.Uniform("mu",0,5, dims=("Event Types"))

mu_ = mu[type_idx]

event_count = pm.Poisson("event_count", mu=mu_, observed=counts, dims = ("obs_id"))

# this trace estimates good mu values for the different Event Types
trace = pm.sample(return_inferencedata=True)

# but the posterior predictive doesn't have them anymore. And I can't, for example, create plots
# for the different Event Type counts I've estimated to be likely.
post = pm.sample_posterior_predictive(trace)

``````

As a beginner with this, I am very open to finding out I’m totally misunderstanding something. Thanks for any advice!

When you call `sample_posterior_predictive()`, you are asking for new, credible values for any observed variables in your model. You have a single observed variable in your mode, `event_count`, which has dimension, `obs_id`. So when you ask for new values of `event_count`, they also have dimension `obs_id`.

I may be misunderstanding, but it sounds like you are looking for values of `mu`, which you can get by investigating your posterior: `trace.posterior["mu"]`. If that isn’t what you want, then I may need more information about what exactly you are looking for.

Hi,

Thanks! I think you’re right that I’ve got dimensions wrong, in some way.

What I’d like to do is get separate samples of Event Counts for both Green and Blue ‘Event Types’. And a `plot_ppc` that plots Green and Blue Events separately.

Does that make sense?

The ‘posterior’ estimated by sampling is giving me potential ‘mu’ values, and it gives me different mu values for different Event Types, which is what I want -

How do I get posterior predictive samples and plots split up in the same way, by ‘Event Type’?

Thank you!

Someone more clever than I can chime in with simpler solutions, but I would do one of these;

• use the values of `mu` in your posterior to parameterize 2 Poisson distributions and use those to generate credible observations associated with each event type
• group your posterior predictions by event type and average all the posterior predictive samples in each group

But like I said, there may a quicker, more concise solution that I can’t think of at the moment.

Thanks again. Here’s what I came up with:

``````plausible_counts = dict()
fig, axs = plt.subplots(1,2)
for i, (etype, mus) in enumerate(trace.posterior.groupby('Event Types')):
mus_for_type = mus.stack(n=('chain', 'draw'))
num_mus = 100
num_samples = 200
possible_mus = gaussian_kde(mus_for_type['mu']).resample(num_mus)
plausible_counts[etype] = poisson(possible_mus).rvs([num_samples, num_mus]).reshape(num_mus * num_samples)
axs[i].hist(plausible_counts[etype], bins=30, density=True, stacked=True)
axs[i].axvline(plausible_counts[etype].mean(),color='red')
axs[i].set_xlabel(f"Event type: {etype}")

for etype, counts in plausible_counts.items():
print(f"{etype} | Mean: {counts.mean()}, 15%: {np.quantile(counts, 0.15)}, 85%: {np.quantile(counts, 0.85)}")
``````

And the results look alright to me -

Thanks for the help!

1 Like

Hi, I’m still mulling this question, and I have another solution (in case someone like me happens across this in the future)

I believe I was overthinking the model. I actually wanted to be observing two totally different variables, so the dimensions of `event_count` needed to reflect that. The following does exactly what I wanted.

Instead of keeping the data in a ‘long’ format, i keep it wide, with separate columns for each ‘event type’.

``````n = 1000
ns = np.arange(0,n)

blue_lam = 1.4
green_lam = 3

blue_dist = poisson(blue_lam)
green_dist = poisson(green_lam)
blues = blue_dist.rvs(n)
greens = green_dist.rvs(n)

dt = pd.DataFrame.from_dict({'blue': blues, 'green':greens})

types = dt.columns

coords = {
"Event Types": types,
"obs_id": np.arange(dt.shape[0])
}

mdl = pm.Model(coords=coords)
with mdl:
mu = pm.Uniform("mu",0,5, dims=("Event Types"))
event_count = pm.Poisson("event_count", mu=mu, observed=dt, dims = ("obs_id","Event Types"))
``````

The posterior predictive samples now give me samples for the different event types. (note that I need to `flatten` the posterior predictive plot by the dimension i don’t want to see separate plots for.

``````with mdl:
post = pm.sample_posterior_predictive(trace)
az.plot_ppc(post, flatten=["obs_id"])
``````