Predict on unseen group in hierarchical model

Hi,
Note: this topic is very similar to this thread (How do we predict on new unseen groups in a hierarchical model in PyMC3?) which I read and adopted to my particular model:
I have a hierarchical model with e.g. 2 levels of hierarchy: clusters and instances.
For instance, I define the model like this:

def model_factory(y, cluster, instance, n_clusters, n_instances):
    with pm.Model() as model:
        a = pm.Normal('a', mu= 0., sigma=3., shape=n_clusters)
        sigma_a = pm.Exponential('sigma_a', 1., shape=n_clusters)

        mu_tilde = pm.Normal('mu_t', mu=0., sigma=1., shape=n_instances)
        mu = pm.Deterministic('mu', mu_tilde * sigma_a[cluster] + a[cluster])

        sigma = pm.Exponential('sigma', 1., shape=n_instances)
        data_obs = pm.Normal('data', mu=mu[instance], sigma=sigma[instance], observed=y)
    return model

You can see that I have a hyperprior for the clusters. In each clusters, there can be multiple instances which use the prior of the cluster.
For instance, I create toy data like this:

np.random.seed(123)

N_clusters = 3  # Number of clusters
N_instances = [10, 5, 1]  # Number of instances per cluster
total_instances = sum(N_instances)
N = 100 # Number of samples per instance
cluster_means = [1., 2., 3.]  # Mean of means within cluster
cluster_means_std = [0.3, 0.1, 0.1]  # Std of means within cluster
std = 0.5

data = []
true_means = []
for i in range(N_clusters):
    if N_instances[i] > 0:
        means = np.random.normal(loc=cluster_means[i], scale=cluster_means_std[i], size=N_instances[i])
        true_means = np.append(true_means, means)
        data.append(np.array([np.random.normal(means[j], std, N) for j in range(N_instances[i])]))
data = np.vstack(data).reshape(-1)
clusters = []
for i in range(N_clusters):
    clusters += [i] * N_samples[i]
instance = np.repeat(np.arange(sum(N_samples)), N)

i.e., I have 3 clusters and each cluster has a certain number of instances (and each instances has 100 data points).

Now I am fitting the model to this data:

with model_factory(data, clusters, instance, N_clusters, total_instances):
    trace = pm.sample(tune=2000, target_accept=0.99)
    idata = az.from_pymc3(trace)

Now let’s assume I want to predict the data for an unseen instance of cluster 1. Following the thread mentioned above, I am trying to do this like this:

df = pm.trace_to_dataframe(trace,
                           varnames=['a', 'sigma_a'],
                           include_transformed=True)
with model_factory(0, [1], [0], 3, 1):
    post_pred = pm.sample_posterior_predictive(df.to_dict('records'))

, i.e., I am only taking the parameters at the cluster level (a and sigma_a) from the posterior trace and specify that my instance (0) belongs to cluster 1. However, this leads to the following error:
ValueError: 0-dimensional argument does not have enough dimensions for all core dimensions ('i_0_0',)

Note that if I change the number of instances to total_instances in the call above, i.e.:

with model_factory(0, [1], [0], 3, total_instances):
    post_pred = pm.sample_posterior_predictive(df.to_dict('records'))

I am not getting any error, but the posterior prediction seems to ignore the posterior samples for a, sigma_a and instead samples from the hyperprior (mean=0, sigma=3.) specified above. Moreover, it doesn’t make sense to me to specify the number of total instances here, because I want to create a new model with an arbitrary number of instances, which just uses the learned parameter for the clusters from the fit.

1 Like

Does anyone have an idea how to solve this? It seems that using a hierarchical model to make predictions for unseen instances of a cluster is a common use case, so there must be a way to do this with PyMC3 and I’m probably doing something stupid here.

Thanks to @lucianopaz’s talk about posterior predictions (Posterior Predictive Sampling in PyMC3 by Luciano Paz), I made some progress. Essentially, I changed the way to extract the posterior samples on the cluster level from the trace. Instead of extracting the values I need, I remove the traces I don’t need:

trace.remove_values('mu_t')
trace.remove_values('mu')
with model_factory(0, [1], [0], 3, 1):
    post_pred = pm.sample_posterior_predictive(trace)

This method was explained in the talk.

With this method, the model samples the mu for my unseen instance from the parameter distribution of the cluster it belongs to (cluster 1 in my sample code above).

However, when I predict for multiple unseen instances from the same cluster and observe the spread of mu values, it does not follow the distribution it is supposed to according to the model definition.
The model specifies that mu ~ N(a, sigma_a), so when I predict for multiple unseen instances, I would expect the spread of mu values across those instances to equal (approx.) sigma_a. However, the spread is much smaller.
Any idea what the cause could be?

1 Like

Thanks for tagging me here @mschmidt87, I had completely missed your post. I’ll take a look this afternoon and see if I can find anything that can be done

2 Likes

@mschmidt87, sorry for the delay. I was finally able to look into your problem today.

Unfortunately, you’ve run into an inconsistent shape behavior that we have been discussing internally just last week. Your particular problem is that the variable:

mu_tilde = pm.Normal('mu_t', mu=0., sigma=1., shape=n_instances)

ends up having shape=1. This shouldn’t be a problem but for some reason the pymc3 distribution’s random method was implemented in a way that handles this particular shape inconsistently. When you call mu_tilde.random(size=something) you end up with an array that has shape (something,). The important thing to notice is that if pymc3 handled things consistently, the result should have had shape (something, 1). This strange behavior is what is causing your problem in the end, because theano is very specific on the number of dimensions its variables must have.

The chain of events that lead to your error are the following:

  1. Call sample_posterior_predictive
  2. Try to draw_values from data_obs
  3. data_obs tries to draw_values from its parameters, which in turn tries to draw from mu.
  4. mu is a theano symbolic tensor that must be compiled into a function and evaluated on some concrete input values to return an array of output values.
  5. During compilation, it sees that its input, mu_tilde, should have ndim==1 (it should be a vector).
  6. The concrete input values for mu_tilde, sigma_a and a are successfully drawn from their distributions random method and their shape is (), (n_clusters) and (n_clusters) respectively. Notice how mu_tilde should have had shape (1,) but doesn’t!
  7. When we try to evaluate the compiled theano function on the supplied input values parameters we find that the first input doesn’t have the required dimensions and raises an exception.

At the moment, the only way around this error is to draw 2 or more instances at the same time :man_facepalming:

We have noticed this strange behavior some times but haven’t addressed it because it would break backwards compatibility. We do intend to change this entirely in the next major release, but we don’t have a concrete estimate of when that could be. Anyway, to keep track if this I’ll open an issue on github so that we don’t lose track of the two inconsistent behaviors that random has.

1 Like

Here’s the link to the newly open pymc3 issue: https://github.com/pymc-devs/pymc3/issues/4206

1 Like

Thank you very much for your reply @lucianopaz.

If I understand correctly, this is the explanation why I needed to specify > 1 instances. However in my last post, I had only used 1 instance and this did not lead to an error.

Also, do you have an idea why the posterior predictions don’t seem to pick up the learned sigma_a from the trace? (Described in my last post)