How do we predict on new unseen groups in a hierarchical model in PyMC3?

This was a really helpful post for beginning to understand how to sample the posterior distributions of a hierarchical model. However, I did have two questions regarding the determination of y_hat for multiple observations and how to sample different levels of a hierarchical model.

  1. With the current construction, the dot product in y_hat produces a n x n matrix for n observations. From my understanding, shouldn’t y_hat be the same shape as y, i.e. n x 1 ?

I’ve modified this non-centered model, using the HierarchicalLogisticRegression module from Nicole Carlson’s “parsing-science” project as a guide, to arrive at a modified hierarchical model:

def model_factory(X, y, site_shared, n_site, n_features=None):
    if n_features is None:
        n_features = X.shape[-1]
    with pm.Model() as model:
        mu_beta = pm.Normal('mu_beta', mu=0., sd=1,shape=(n_features,))
        sigma_beta = pm.HalfCauchy('sigma_beta', 5,shape=(n_features,))
        a = pm.Normal('a', mu=0., sd=1)
        b = pm.Normal('b', mu=0, sd=1, shape=(n_site, n_features))
        betas = mu_beta + sigma_beta * b
        y_hat = a + tt.sum(betas[site_shared]*X, 1)
        pm.Bernoulli('y_like', logit_p=y_hat, observed=y)
    return model

Is this an appropriate modification to the original model to create a non-centered hierarchical model using different priors for each feature across multiple observations?

  1. I’m still a little confused about how to sample the posterior distribution for the population level of the hierarchy.

Say you want to make a prediction for unseen data on different levels of the hierarchy. In the first case, you don’t know which site your data belongs to and you simply want to sample the population parameters for a prediction. Is this when you sample using only [‘mu_beta’, ‘sigma_beta’, ‘a’] and run the sample_posterior_predictive on the model using those traces?

# Population based prediction, site_shared doesn't matter
with model_factory(X = new_site_X,
                   y = np.empty(ntrain_obs, dtype=np.int32),
                   site_shared = np.random.randint(ntrain_site, size=ntrain_obs), # Placeholder because the actual site isn't known
                   n_site = ntrain_site) as pop_model:
  
  df = pm.trace_to_dataframe(train_trace,
                             varnames=['mu_beta', 'sigma_beta', 'a'],
                             include_transformed=True)
  ppc_pop = pm.sample_posterior_predictive(trace=df.to_dict('records'),
                                           samples=1000)

Conversely, say you know the site (site=0) and want to predict using the site-specific posterior distributions. Would you sample like this?

# site_shared = 0 prediction, set site_shared to 0 for each observation and sample from original trace
with model_factory(X = new_site_X,
                   y = np.empty(ntrain_obs, dtype=np.int32),
                   site_shared = np.array([0]*ntest_obs), # Only use betas for site==0
                   n_site = ntrain_site) as lvl0_model:

  ppc_level0 = pm.sample_posterior_predictive(trace=train_trace, # Use all levels of the trace
                                           samples=1000)
1 Like