Not Understanding the Posterior

Hi

About the theta, are you perhaps confused about the indexing? The idea is that you might have 5 observations from 2 groups and the index is thus idx = np.array([ 0, 0, 0, 1, 1]) which says that the first 3 observations are from group (or county) 1 and the last 2 are from group 2. If you then have two distributions, one for the expected value of each group you can use the indexing variable to automatically copy the expected values to follow the same order as your observed data:

county = pm.Normal(... dims="county")  # [countyA, countyB]
theta = county[idx]  # [countyA, countyA, countyA, countyB, countyB]

About the sampling question I am not sure what you mean. It is possible to write the model once and only sample it much later.

with pm.Model() as m:
    ...
    like = pm.Normal(... observed=data)

< a lot of unrelated Python code>

with m:
    trace = pm.sample()

My general advice if things feel confusing is to work with a much smaller model first (a couple of groups and observations per group) to see if you can make sense of all the moving parts before returning to the full model.

When you are unsure if your model is in line with your data you can use toy data that you generate yourself. If your model gives values close to what you know to be the right answer you can be more confident that you are doing the right thing.

I hope my answer was helpful.

2 Likes