How to keep dimensions in a posterior predictive sample?

Hi,
I’m just learning these techniques, and trying to understand how to use dimensions to deal with categorical data.

In my example, I’ve got a table of ‘Event Counts’ over a period of weeks, split up by ‘Event Type’. So week 1 might have 4 Blue events and 2 Green ones. I’ve set up this data in ‘long’ format, so the table has columns Week,Event Type, Event Count.

Event counts are Poisson distributed, so my model estimates a mu for each Event Type.

Using dimensions, I can estimate a posterior with dimensions (chain: 4, draw: 1000, Event Types: 2)

I’d like to be able to create posterior predictive samples for each of the different dimensions. But when I try pm.sample_posterior_predictive(trace), I end with inferencedata with only dimensions:
(chain: 4, draw: 1000, obs_id: 2000)

I want to get samples from each of the ‘Event Types’ I’ve estimated parameters for (and plot them with az.plot_ppc), but I can’t figure out how. Thank you so much for advice!

Here is a whole example:

n = 1000
ns = np.arange(0,n)

blue_lam = 1.4
green_lam = 3

blue_dist = poisson(blue_lam)
green_dist = poisson(green_lam)
blues = blue_dist.rvs(n)
greens = green_dist.rvs(n)


dt = pd.DataFrame.from_dict({'blue': blues, 'green':greens})
dt = dt.stack().reset_index()
dt.columns = ["sample number", "event type", "Event Count"]


type_idx, types = pd.factorize(dt['event type'])
counts = dt['Event Count']
sample_idx, samples = pd.factorize(dt['sample number'])


coords = {
    "Event Types": types,
    "Samples": samples,
    "obs_id": np.arange(dt.shape[0])
}

mdl = pm.Model(coords=coords)
with mdl:
    type_idx = pm.ConstantData("type_idx", type_idx, dims=("obs_id"))
    #sample_idx = pm.ConstantData("sample_idx", sample_idx, dims=("obs_id"))
    mu = pm.Uniform("mu",0,5, dims=("Event Types"))
    
    mu_ = mu[type_idx]
    
    event_count = pm.Poisson("event_count", mu=mu_, observed=counts, dims = ("obs_id"))
    
    # this trace estimates good mu values for the different Event Types
    trace = pm.sample(return_inferencedata=True)

    # but the posterior predictive doesn't have them anymore. And I can't, for example, create plots
    # for the different Event Type counts I've estimated to be likely.
    post = pm.sample_posterior_predictive(trace)


As a beginner with this, I am very open to finding out I’m totally misunderstanding something. Thanks for any advice!

When you call sample_posterior_predictive(), you are asking for new, credible values for any observed variables in your model. You have a single observed variable in your mode, event_count, which has dimension, obs_id. So when you ask for new values of event_count, they also have dimension obs_id.

I may be misunderstanding, but it sounds like you are looking for values of mu, which you can get by investigating your posterior: trace.posterior["mu"]. If that isn’t what you want, then I may need more information about what exactly you are looking for.

Hi,

Thanks! I think you’re right that I’ve got dimensions wrong, in some way.

What I’d like to do is get separate samples of Event Counts for both Green and Blue ‘Event Types’. And a plot_ppc that plots Green and Blue Events separately.

Does that make sense?

The ‘posterior’ estimated by sampling is giving me potential ‘mu’ values, and it gives me different mu values for different Event Types, which is what I want -

How do I get posterior predictive samples and plots split up in the same way, by ‘Event Type’?

Thank you!

Someone more clever than I can chime in with simpler solutions, but I would do one of these;

  • use the values of mu in your posterior to parameterize 2 Poisson distributions and use those to generate credible observations associated with each event type
  • group your posterior predictions by event type and average all the posterior predictive samples in each group

But like I said, there may a quicker, more concise solution that I can’t think of at the moment.

Thanks again. Here’s what I came up with:

plausible_counts = dict()
fig, axs = plt.subplots(1,2)
for i, (etype, mus) in enumerate(trace.posterior.groupby('Event Types')): 
    mus_for_type = mus.stack(n=('chain', 'draw'))
    num_mus = 100
    num_samples = 200
    possible_mus = gaussian_kde(mus_for_type['mu']).resample(num_mus)
    plausible_counts[etype] = poisson(possible_mus).rvs([num_samples, num_mus]).reshape(num_mus * num_samples)
    axs[i].hist(plausible_counts[etype], bins=30, density=True, stacked=True)
    axs[i].axvline(plausible_counts[etype].mean(),color='red')
    axs[i].set_xlabel(f"Event type: {etype}")
    
    
for etype, counts in plausible_counts.items():
    print(f"{etype} | Mean: {counts.mean()}, 15%: {np.quantile(counts, 0.15)}, 85%: {np.quantile(counts, 0.85)}")

And the results look alright to me -
image

Thanks for the help!

1 Like

Hi, I’m still mulling this question, and I have another solution (in case someone like me happens across this in the future)

I believe I was overthinking the model. I actually wanted to be observing two totally different variables, so the dimensions of event_count needed to reflect that. The following does exactly what I wanted.

Instead of keeping the data in a ‘long’ format, i keep it wide, with separate columns for each ‘event type’.

n = 1000
ns = np.arange(0,n)

blue_lam = 1.4
green_lam = 3

blue_dist = poisson(blue_lam)
green_dist = poisson(green_lam)
blues = blue_dist.rvs(n)
greens = green_dist.rvs(n)

dt = pd.DataFrame.from_dict({'blue': blues, 'green':greens})

types = dt.columns

coords = {
    "Event Types": types,
    "obs_id": np.arange(dt.shape[0])
}

mdl = pm.Model(coords=coords)
with mdl:
    mu = pm.Uniform("mu",0,5, dims=("Event Types"))
    event_count = pm.Poisson("event_count", mu=mu, observed=dt, dims = ("obs_id","Event Types"))

The posterior predictive samples now give me samples for the different event types. (note that I need to flatten the posterior predictive plot by the dimension i don’t want to see separate plots for.

with mdl:
    post = pm.sample_posterior_predictive(trace)
az.plot_ppc(post, flatten=["obs_id"])