Hi all,
I know its been a few weeks, but I wanted to follow-up with this thread. I was able to spend some time on a few variations of the problem. I reused some of the code from @Benjamin to write an example with a few variations of out-of-model prediction.
Just to recap, I have done out-of-sample prediction in a hierarchical model before using pm.Data in the model and then pm.set_data to swap the data and make predictions. However, (I believe) this only works for cases in which the group index already exists within the hierarchical structure (i.e., if I originally have 10 groups, I can’t find the 14th group in the original data). Here are some things I tried and some observations. I would love to get some thoughts on whether or not this is the correct interpretation of what I observed. The basic model looks like this:
# Data
g = pm.Data("g", frame["idx"].to_numpy())
y = pm.Data("y", frame["data"].to_numpy())
# Population mean
mu = pm.Normal("mu", 0., 5.)
# Across group variability of the mean
tau = pm.HalfNormal("tau", 3.)
# Population standard deviation
sigma = pm.HalfNormal("sigma", 3.)
# Group specific mean
eps_mu_g = pm.Normal("eps_mu_g", dims="idx")
mu_g = pm.Deterministic("mu_g", mu + tau * eps_mu_g, dims="idx")
# Observed data
pm.Normal("y_obs", mu=mu_g[g], sigma=sigma, observed=y)
-
mu_gandeps_mu_gare the shape of the number of groups. So in my first attempt, I only usepm.Flatfor the params that are not group specific, somu,tau, andsigma. When sampling the the posterior predictive, I found that the distribution was the same for all groups since no group specific information is being used. It confused me at first but makes perfect sense now. I thought that maybe some information would leak in from the backend but I realized I never told PyMC where to look. This is because my non-centered parameps_mu_g_newdid not have the same name so the variable did not conflict with anything in the original inference data. -
In my second attempt I still used
pm.Flatformu,sigma, andtau, but this time I explicitly definedeps_mu_g_newto have mu/sigma that were taken directly from the inference data summary. New groups were assignedmu=0andsigma=3. Now the model was able to use that previously learned information, but after the fact I realized that this still does not mean that the model sees the group specific params in the original inference data. -
In this attempt, I used an entirely new model and did not use
pm.Flat. All of the variables were assigned values ofmuandsigmathat were taken from the inference data summary and new values were the samemu=0andsigma=3from above. In this case I was able to fully sample the new model sincepm.Flatwas not used for defining any of the variables. -
In the last attempt I followed the example from pymc labs where I use the old variable names and
pm.Flatto tell the model to look at the inference data for existing groups and then defined a new variable for new groups. Those were then concatenated and then indexed by group number like usual.
Obviously (1) does not yield meaningful predictions on new groups since no group specific information carries over. Interestingly (or perhaps not to others), (2-4) gave me nearly identical predictions within each group.
One thing that I am concerned about is how models (2) and (4) can see any of the new data other than the group it belongs to. What I mean is that since we only sample the posterior predictive, the only value being used is the one used to index the mu_g param. The raw data for new groups is completely unobserved. In example (3), we sample a completely new model and all data are passed in as observed when defining the mean of the observed variable, y_obs.
I can see how (2) and (4) are advantageous when I only have some predicted group means for unseen groups where I can pass in the priors, such as the example from pymc labs. However, when I have actual observed data that comes in a later time, is the only way for the model to see the values of those observed data for me to train a new model and just use the summary of previous inference as priors in the new model? It feels like I am throwing away important information by just computing the mean/std of the results and then using those as priors.
I think in the case where I have few observations on new groups, the raw data will not matter much anyways since individual group means with insufficient data will be regressed to the pooled mean. However, I am more concerned about cases where I have new data that are sufficiently different from the group mean such that the group mean would be pulled away from the pooled mean if it were in the original sampling. However, I still want this new group to abide by the constraints imposed by the hierarchical model.
Thanks!