Hi,
I’m just learning these techniques, and trying to understand how to use dimensions to deal with categorical data.
In my example, I’ve got a table of ‘Event Counts’ over a period of weeks, split up by ‘Event Type’. So week 1 might have 4 Blue events and 2 Green ones. I’ve set up this data in ‘long’ format, so the table has columns Week
,Event Type
, Event Count
.
Event counts are Poisson distributed, so my model estimates a mu
for each Event Type.
Using dimensions, I can estimate a posterior with dimensions (chain: 4, draw: 1000, Event Types: 2)
I’d like to be able to create posterior predictive samples for each of the different dimensions. But when I try pm.sample_posterior_predictive(trace)
, I end with inferencedata with only dimensions:
(chain: 4, draw: 1000, obs_id: 2000)
I want to get samples from each of the ‘Event Types’ I’ve estimated parameters for (and plot them with az.plot_ppc), but I can’t figure out how. Thank you so much for advice!
Here is a whole example:
n = 1000
ns = np.arange(0,n)
blue_lam = 1.4
green_lam = 3
blue_dist = poisson(blue_lam)
green_dist = poisson(green_lam)
blues = blue_dist.rvs(n)
greens = green_dist.rvs(n)
dt = pd.DataFrame.from_dict({'blue': blues, 'green':greens})
dt = dt.stack().reset_index()
dt.columns = ["sample number", "event type", "Event Count"]
type_idx, types = pd.factorize(dt['event type'])
counts = dt['Event Count']
sample_idx, samples = pd.factorize(dt['sample number'])
coords = {
"Event Types": types,
"Samples": samples,
"obs_id": np.arange(dt.shape[0])
}
mdl = pm.Model(coords=coords)
with mdl:
type_idx = pm.ConstantData("type_idx", type_idx, dims=("obs_id"))
#sample_idx = pm.ConstantData("sample_idx", sample_idx, dims=("obs_id"))
mu = pm.Uniform("mu",0,5, dims=("Event Types"))
mu_ = mu[type_idx]
event_count = pm.Poisson("event_count", mu=mu_, observed=counts, dims = ("obs_id"))
# this trace estimates good mu values for the different Event Types
trace = pm.sample(return_inferencedata=True)
# but the posterior predictive doesn't have them anymore. And I can't, for example, create plots
# for the different Event Type counts I've estimated to be likely.
post = pm.sample_posterior_predictive(trace)
As a beginner with this, I am very open to finding out I’m totally misunderstanding something. Thanks for any advice!