Predict on unseen group in hierarchical model

Hi,
Note: this topic is very similar to this thread (How do we predict on new unseen groups in a hierarchical model in PyMC3?) which I read and adopted to my particular model:
I have a hierarchical model with e.g. 2 levels of hierarchy: clusters and instances.
For instance, I define the model like this:

def model_factory(y, cluster, instance, n_clusters, n_instances):
    with pm.Model() as model:
        a = pm.Normal('a', mu= 0., sigma=3., shape=n_clusters)
        sigma_a = pm.Exponential('sigma_a', 1., shape=n_clusters)

        mu_tilde = pm.Normal('mu_t', mu=0., sigma=1., shape=n_instances)
        mu = pm.Deterministic('mu', mu_tilde * sigma_a[cluster] + a[cluster])

        sigma = pm.Exponential('sigma', 1., shape=n_instances)
        data_obs = pm.Normal('data', mu=mu[instance], sigma=sigma[instance], observed=y)
    return model

You can see that I have a hyperprior for the clusters. In each clusters, there can be multiple instances which use the prior of the cluster.
For instance, I create toy data like this:

np.random.seed(123)

N_clusters = 3  # Number of clusters
N_instances = [10, 5, 1]  # Number of instances per cluster
total_instances = sum(N_instances)
N = 100 # Number of samples per instance
cluster_means = [1., 2., 3.]  # Mean of means within cluster
cluster_means_std = [0.3, 0.1, 0.1]  # Std of means within cluster
std = 0.5

data = []
true_means = []
for i in range(N_clusters):
    if N_instances[i] > 0:
        means = np.random.normal(loc=cluster_means[i], scale=cluster_means_std[i], size=N_instances[i])
        true_means = np.append(true_means, means)
        data.append(np.array([np.random.normal(means[j], std, N) for j in range(N_instances[i])]))
data = np.vstack(data).reshape(-1)
clusters = []
for i in range(N_clusters):
    clusters += [i] * N_samples[i]
instance = np.repeat(np.arange(sum(N_samples)), N)

i.e., I have 3 clusters and each cluster has a certain number of instances (and each instances has 100 data points).

Now I am fitting the model to this data:

with model_factory(data, clusters, instance, N_clusters, total_instances):
    trace = pm.sample(tune=2000, target_accept=0.99)
    idata = az.from_pymc3(trace)

Now let’s assume I want to predict the data for an unseen instance of cluster 1. Following the thread mentioned above, I am trying to do this like this:

df = pm.trace_to_dataframe(trace,
                           varnames=['a', 'sigma_a'],
                           include_transformed=True)
with model_factory(0, [1], [0], 3, 1):
    post_pred = pm.sample_posterior_predictive(df.to_dict('records'))

, i.e., I am only taking the parameters at the cluster level (a and sigma_a) from the posterior trace and specify that my instance (0) belongs to cluster 1. However, this leads to the following error:
ValueError: 0-dimensional argument does not have enough dimensions for all core dimensions ('i_0_0',)

Note that if I change the number of instances to total_instances in the call above, i.e.:

with model_factory(0, [1], [0], 3, total_instances):
    post_pred = pm.sample_posterior_predictive(df.to_dict('records'))

I am not getting any error, but the posterior prediction seems to ignore the posterior samples for a, sigma_a and instead samples from the hyperprior (mean=0, sigma=3.) specified above. Moreover, it doesn’t make sense to me to specify the number of total instances here, because I want to create a new model with an arbitrary number of instances, which just uses the learned parameter for the clusters from the fit.

Does anyone have an idea how to solve this? It seems that using a hierarchical model to make predictions for unseen instances of a cluster is a common use case, so there must be a way to do this with PyMC3 and I’m probably doing something stupid here.

Thanks to @lucianopaz’s talk about posterior predictions (Posterior Predictive Sampling in PyMC3 by Luciano Paz), I made some progress. Essentially, I changed the way to extract the posterior samples on the cluster level from the trace. Instead of extracting the values I need, I remove the traces I don’t need:

trace.remove_values('mu_t')
trace.remove_values('mu')
with model_factory(0, [1], [0], 3, 1):
    post_pred = pm.sample_posterior_predictive(trace)

This method was explained in the talk.

With this method, the model samples the mu for my unseen instance from the parameter distribution of the cluster it belongs to (cluster 1 in my sample code above).

However, when I predict for multiple unseen instances from the same cluster and observe the spread of mu values, it does not follow the distribution it is supposed to according to the model definition.
The model specifies that mu ~ N(a, sigma_a), so when I predict for multiple unseen instances, I would expect the spread of mu values across those instances to equal (approx.) sigma_a. However, the spread is much smaller.
Any idea what the cause could be?

1 Like

Thanks for tagging me here @mschmidt87, I had completely missed your post. I’ll take a look this afternoon and see if I can find anything that can be done

2 Likes