How to keep dimensions in a posterior predictive sample?

FleeceJune · July 19, 2022, 4:36pm

Hi,
I’m just learning these techniques, and trying to understand how to use dimensions to deal with categorical data.

In my example, I’ve got a table of ‘Event Counts’ over a period of weeks, split up by ‘Event Type’. So week 1 might have 4 Blue events and 2 Green ones. I’ve set up this data in ‘long’ format, so the table has columns Week,Event Type, Event Count.

Event counts are Poisson distributed, so my model estimates a mu for each Event Type.

Using dimensions, I can estimate a posterior with dimensions (chain: 4, draw: 1000, Event Types: 2)

I’d like to be able to create posterior predictive samples for each of the different dimensions. But when I try pm.sample_posterior_predictive(trace), I end with inferencedata with only dimensions:
(chain: 4, draw: 1000, obs_id: 2000)

I want to get samples from each of the ‘Event Types’ I’ve estimated parameters for (and plot them with az.plot_ppc), but I can’t figure out how. Thank you so much for advice!

Here is a whole example:

n = 1000
ns = np.arange(0,n)

blue_lam = 1.4
green_lam = 3

blue_dist = poisson(blue_lam)
green_dist = poisson(green_lam)
blues = blue_dist.rvs(n)
greens = green_dist.rvs(n)


dt = pd.DataFrame.from_dict({'blue': blues, 'green':greens})
dt = dt.stack().reset_index()
dt.columns = ["sample number", "event type", "Event Count"]


type_idx, types = pd.factorize(dt['event type'])
counts = dt['Event Count']
sample_idx, samples = pd.factorize(dt['sample number'])


coords = {
    "Event Types": types,
    "Samples": samples,
    "obs_id": np.arange(dt.shape[0])
}

mdl = pm.Model(coords=coords)
with mdl:
    type_idx = pm.ConstantData("type_idx", type_idx, dims=("obs_id"))
    #sample_idx = pm.ConstantData("sample_idx", sample_idx, dims=("obs_id"))
    mu = pm.Uniform("mu",0,5, dims=("Event Types"))
    
    mu_ = mu[type_idx]
    
    event_count = pm.Poisson("event_count", mu=mu_, observed=counts, dims = ("obs_id"))
    
    # this trace estimates good mu values for the different Event Types
    trace = pm.sample(return_inferencedata=True)

    # but the posterior predictive doesn't have them anymore. And I can't, for example, create plots
    # for the different Event Type counts I've estimated to be likely.
    post = pm.sample_posterior_predictive(trace)

As a beginner with this, I am very open to finding out I’m totally misunderstanding something. Thanks for any advice!

cluhmann · July 19, 2022, 6:32pm

When you call sample_posterior_predictive(), you are asking for new, credible values for any observed variables in your model. You have a single observed variable in your mode, event_count, which has dimension, obs_id. So when you ask for new values of event_count, they also have dimension obs_id.

I may be misunderstanding, but it sounds like you are looking for values of mu, which you can get by investigating your posterior: trace.posterior["mu"]. If that isn’t what you want, then I may need more information about what exactly you are looking for.

FleeceJune · July 19, 2022, 7:20pm

Hi,

Thanks! I think you’re right that I’ve got dimensions wrong, in some way.

What I’d like to do is get separate samples of Event Counts for both Green and Blue ‘Event Types’. And a plot_ppc that plots Green and Blue Events separately.

Does that make sense?

The ‘posterior’ estimated by sampling is giving me potential ‘mu’ values, and it gives me different mu values for different Event Types, which is what I want -

How do I get posterior predictive samples and plots split up in the same way, by ‘Event Type’?

Thank you!

cluhmann · July 19, 2022, 8:28pm

Someone more clever than I can chime in with simpler solutions, but I would do one of these;

use the values of mu in your posterior to parameterize 2 Poisson distributions and use those to generate credible observations associated with each event type
group your posterior predictions by event type and average all the posterior predictive samples in each group

But like I said, there may a quicker, more concise solution that I can’t think of at the moment.

FleeceJune · July 20, 2022, 1:45am

Thanks again. Here’s what I came up with:

plausible_counts = dict()
fig, axs = plt.subplots(1,2)
for i, (etype, mus) in enumerate(trace.posterior.groupby('Event Types')): 
    mus_for_type = mus.stack(n=('chain', 'draw'))
    num_mus = 100
    num_samples = 200
    possible_mus = gaussian_kde(mus_for_type['mu']).resample(num_mus)
    plausible_counts[etype] = poisson(possible_mus).rvs([num_samples, num_mus]).reshape(num_mus * num_samples)
    axs[i].hist(plausible_counts[etype], bins=30, density=True, stacked=True)
    axs[i].axvline(plausible_counts[etype].mean(),color='red')
    axs[i].set_xlabel(f"Event type: {etype}")
    
    
for etype, counts in plausible_counts.items():
    print(f"{etype} | Mean: {counts.mean()}, 15%: {np.quantile(counts, 0.15)}, 85%: {np.quantile(counts, 0.85)}")

And the results look alright to me -

Thanks for the help!

FleeceJune · August 2, 2022, 1:52am

Hi, I’m still mulling this question, and I have another solution (in case someone like me happens across this in the future)

I believe I was overthinking the model. I actually wanted to be observing two totally different variables, so the dimensions of event_count needed to reflect that. The following does exactly what I wanted.

Instead of keeping the data in a ‘long’ format, i keep it wide, with separate columns for each ‘event type’.

n = 1000
ns = np.arange(0,n)

blue_lam = 1.4
green_lam = 3

blue_dist = poisson(blue_lam)
green_dist = poisson(green_lam)
blues = blue_dist.rvs(n)
greens = green_dist.rvs(n)

dt = pd.DataFrame.from_dict({'blue': blues, 'green':greens})

types = dt.columns

coords = {
    "Event Types": types,
    "obs_id": np.arange(dt.shape[0])
}

mdl = pm.Model(coords=coords)
with mdl:
    mu = pm.Uniform("mu",0,5, dims=("Event Types"))
    event_count = pm.Poisson("event_count", mu=mu, observed=dt, dims = ("obs_id","Event Types"))

The posterior predictive samples now give me samples for the different event types. (note that I need to flatten the posterior predictive plot by the dimension i don’t want to see separate plots for.

with mdl:
    post = pm.sample_posterior_predictive(trace)
az.plot_ppc(post, flatten=["obs_id"])

Topic		Replies	Views
Getting Point Estimates from Posterior to add to the Data Frame v5	4	546	May 12, 2022
Posterior predictive sampling with data variance Questions	10	2206	September 14, 2018
How to generate posterior predictive samples with size different than the observed variable? Questions	1	334	June 16, 2021
Cannot plot_posterior with several dimensions Questions	1	330	January 8, 2021
Number of samples generated by `sample_posterior_predictive` Questions	3	794	October 7, 2020

How to keep dimensions in a posterior predictive sample?

Related Topics