Difference between unpooled model and partial pooling?

sanket · August 8, 2024, 6:43am

Hi I am trying to understand the pymc working. I tried implementing the models here: A Primer on Bayesian Methods for Multilevel Modeling — PyMC example gallery
Suppose my data frame has columns-> city, date, #of accidents.
Now if we look at unpooled model:

and partial pooling model:

Apart from the generic prior, both seems to be same, the partial pooling model also getting prior from one of the cities, and the unpooled model also getting the prior from one of the cities. The generic prior at the top might not be making any difference. The code looks somewhat like this:

partial pooling model:

with pm.Model(coords=coords) as partial_pooling:
    mu = pm.Normal("mu", 0, sigma=10)
    sigma = pm.HalfCauchy("sigma", 10)

    mu_cid = pm.Normal("mu_city", mu=mu, sigma=sigma,dims='city')
    sigma_cid = pm.HalfCauchy("sigma_city", 10)


    y = pm.Normal('city_mean',mu=mu_cid[adinstid],sigma=sigma_cid,observed=small_df['accidents'])

unpooled model:


with pm.Model(coords=coords) as unpooled_model:
    mu = pm.Normal("mu", 0, sigma=10,dims='city')
    sigma = pm.HalfCauchy("sigma", 10)
    y = pm.Normal('city_mean',mu=mu[city],sigma=sigma,observed=small_df['accidents'])

can someone explain, what am I missing.

ricardoV94 · August 8, 2024, 7:55am

In the partial pooled model data flows across groups through those very hyperparameters. For instance, it can learn that most groups behave very similar (translated into a small posterior sigma hyperparameter) which would make it believe that a specific group for which there’s not much data is also likely to be similar to the rest, given the homogeneity across most groups.

This reasoning also extends to new groups out-of-sample. Would you rather start with an uninformative prior for a new group or believe the posterior mean/std of all the in-sample groups to be a better initial guess?

The model can also learn groups are very distinct (translated into a large posterior hyperparameter sigma) and arrive at similar conclusions as the unpooled model. But unlike the unpooled model it is not structurally “destined” to ignore what’s going on with other groups. It can learn to do so if the data argues for it.

The statistical rethinking book has a great chapter with a fish example that illustrates this clearly

sanket · August 8, 2024, 8:13am

Thanks for the reply @ricardoV94
So you mean, even though the plate notations looks similar, there is a information flow across cities in the partial pooling model? Which is not happening in the unpooled model. If i have to visualise the info sharing how can i do that.
Thanks once again, just trying to understand the pymc working.

ricardoV94 · August 8, 2024, 1:33pm

The easiest way is to model with and without pooling and you’ll see the implied “shrinkage” effects of the groups parameters towards the group mean. The chapter from the book I mentioned is really good here.

In graphical form, the unpooled model has a batch of independent 61 parameters. Once you give them a shared hyperparameter they become related in the posterior, because the hyperparameter has to find the best compromise with all of the individual parameters, and each individual parameter has to find the best compromise with the hyperparameter. Because of this mutual dependency the group parameters end up influencing each other indirectly.

In fact anytime you can link information from two parameters, no matter how separated, there will be information sharing in the posterior (batches of independent dimensions are not linked, but the graphviz does not allow you to distinguish this because it doesn’t show each individual connection).

jessegrabowski · August 8, 2024, 2:43pm

This video is an excellent introduction to the subject, with a lot of great visualizations.

sanket · August 14, 2024, 7:20pm

Thanks a lot for the details. It helped me understand the difference.

sanket · August 14, 2024, 7:34pm

Just one more question on the model above, as you can see we are defining the “dims”, but when we pass the data frame, we are only using the “accidents” column, how it determines which accident belongs to which city. Because when I try to sample it should give number of samples N for each city but it is not the case.
if we check the posterior predictive it has city_mean_dim_2 as 110. I don’t understand why it’s not equal to number of cities. Am I missing something here ?

Topic		Replies	Views
Not Understanding the Posterior Questions	2	425	December 31, 2020
Models with different pooling give very different results Questions	2	540	December 26, 2017
Re-coding a partial pooling from pymc to pymc3 Questions	1	444	November 2, 2018
Posterior samples from multi-level model when choosing the index values Questions	3	360	July 29, 2021
Pooling, Unpooling and Partial Pooling where each data point is a series of data v5 modeling	4	676	August 18, 2022

Difference between unpooled model and partial pooling?

Related topics