Not Understanding the Posterior

Hello - I’m really enjoying the library and hoping someone can clarify some conceptual things for me. I’ve watched 9 hours of Scipy presentations and gone through multiple tutorial notebooks and am still unclear on some things.

I’m building a model very similar to the Partial Pooling Model demonstrated here.

This is the snippet I’m interested in:

with pm.Model(coords=coords) as partial_pooling:
    county_idx = pm.Data("county_idx", county, dims="obs_id")
    # Hyperpriors:
    a = pm.Normal("a", mu=0.0, sigma=10.0)
    sigma_a = pm.Exponential("sigma_a", 1.0)

    # Varying intercepts:
    a_county = pm.Normal("a_county", mu=a, sigma=sigma_a, dims="County")

    # Expected value per county:
    theta = a_county[county_idx]
    # Model error:
    sigma = pm.Exponential("sigma", 1.0)

    y = pm.Normal("y", theta, sigma=sigma, observed=log_radon, dims="obs_id")
pm.model_to_graphviz(partial_pooling)

What I don’t understand is why is theta called the “Expected value per county”? Why is the likelihood (y) not sampled when wanting to sample expected distributions for each group in the hierarchy?

I’m trying to do something very similar with a hierarchical model where I estimate distributions of groups in a single model but, I’m not sure I’m comparing the distributions from PyMC3 to my original data correctly.

In my code I convert my trace “theta” using az.from_pymc3() which is a (4000, n_groups) shaped array into sampled distributions for each group and plot against the original data. But I can also convert my likelihood (y) into an array which is (4000, size_of_original_data) and plot that against the original data and both look reasonable. I’m 99% sure if I want to compare my univariate original data distributions to the sampled expected distributions I would use “theta” as shown in all the tutorials, but I don’t understand why. Using Arviz’s built in plots are helpful but I need to understand the raw sampled data for add flexibility. I’ll add my code below and thanks for any insights or reference material.

coords = {'groups':data['hierarchy'].unique(),
          'obs_id':np.arange(len(data))
         }

with pm.Model(coords=coords) as model_partial_pooling:
    # Hyperpriors for group nodes
    alpha_a = pm.HalfNormal('alpha_a', sigma=observed_mean)
    beta_a = pm.HalfNormal('beta_a', sigma=1.)

    a = pm.Gamma('a', alpha=alpha_a, beta=beta_a, dims='groups')

    # Expected value per groups
    ac_est = a[groups_idx]
    # Model error
    eps = pm.HalfCauchy('eps', beta=1.)
    
    # Data likelihood
    ac_dists = pm.Normal('ac_dists',
                         mu=ac_est,
                         sigma=eps,
                         observed=data['scaled_target'],
                         dims='obs_id'
                        )
    
    prior_partial_pooling = pm.sample_prior_predictive()

# # Inference button (TM)!
with model_partial_pooling:
    trace_partial_pooling = pm.sample(draws=2000,
                                      tune=2000,                    
                                      cores=1,
                                      #return_inferencedata=True,
                                      init='adapt_diag')
    
with model_partial_pooling:
    ppc_partial_pooling = pm.sample_posterior_predictive(
        trace_partial_pooling, var_names=['alpha_a',
                                          'a',
                                          'eps',
                                          'ac_dists']
    )
    
data_partial_pooling = az.from_pymc3(trace=trace_partial_pooling, 
                                     prior=prior_partial_pooling,
                                     posterior_predictive=ppc_partial_pooling,
                                     model=model_partial_pooling
                                     )

### This is the dataframe where I compare each group to the original groups data ###
# Assign the groups as column names to their posterior distribution
ac_groups = data_partial_pooling.posterior_predictive.groups
ac_ppc_partial_pooling = data_partial_pooling.posterior_predictive.a[0]\
                                             .to_dataframe().reset_index()
assert ac_ppc_partial_pooling['chain'].nunique() == 1
ac_ppc_partial_pooling.drop(['draw', 'chain'], axis=1, inplace=True)
ac_ppc_partial_pooling.columns = ['group', 'data']
ac_ppc_partial_pooling = ac_ppc_partial_pooling.pivot(columns='group')
ac_ppc_partial_pooling.columns = ac_ppc_partial_pooling.columns.get_level_values(1)
ac_ppc_partial_pooling.head()

Plot of my “ac_ppc_partial_pooling” data frame data against the original data for each group
The poor fit is another problem but I want to ensure I’m even comparing the right things

Hi

About the theta, are you perhaps confused about the indexing? The idea is that you might have 5 observations from 2 groups and the index is thus idx = np.array([ 0, 0, 0, 1, 1]) which says that the first 3 observations are from group (or county) 1 and the last 2 are from group 2. If you then have two distributions, one for the expected value of each group you can use the indexing variable to automatically copy the expected values to follow the same order as your observed data:

county = pm.Normal(... dims="county")  # [countyA, countyB]
theta = county[idx]  # [countyA, countyA, countyA, countyB, countyB]

About the sampling question I am not sure what you mean. It is possible to write the model once and only sample it much later.

with pm.Model() as m:
    ...
    like = pm.Normal(... observed=data)

< a lot of unrelated Python code>

with m:
    trace = pm.sample()

My general advice if things feel confusing is to work with a much smaller model first (a couple of groups and observations per group) to see if you can make sense of all the moving parts before returning to the full model.

When you are unsure if your model is in line with your data you can use toy data that you generate yourself. If your model gives values close to what you know to be the right answer you can be more confident that you are doing the right thing.

I hope my answer was helpful.

2 Likes

Hey thanks for attempting to answer my rambling question. Your indexing example did help me get a better grasp on how that functions inside PyMC3. I watched another Chris Fonnesbeck video and realized where I was conceptually messed up.

I had a different goal from the tutorial, they were demonstrating how to make a prediction using the expected value, I was attempting to plot the posterior predictive sampled distribution from PyMC3 onto my original data. “theta” really is a distribution of the expected value or mean. The likelihood “y” is what I was after to complete my objective. What doubly confused me was I plotted my expected value over the original data and some of them looked good, but this was due to groups with few samples had a very wide mean estimate, while groups with more samples looked messed up due to having a more certain expected value estimate.

There is a lot to wrap my head around coming from a more traditional ML background.