Difference between unpooled model and partial pooling?

Hi I am trying to understand the pymc working. I tried implementing the models here: A Primer on Bayesian Methods for Multilevel Modeling — PyMC example gallery
Suppose my data frame has columns-> city, date, #of accidents.
Now if we look at unpooled model:


and partial pooling model:

Apart from the generic prior, both seems to be same, the partial pooling model also getting prior from one of the cities, and the unpooled model also getting the prior from one of the cities. The generic prior at the top might not be making any difference. The code looks somewhat like this:

partial pooling model:

with pm.Model(coords=coords) as partial_pooling:
    mu = pm.Normal("mu", 0, sigma=10)
    sigma = pm.HalfCauchy("sigma", 10)

    mu_cid = pm.Normal("mu_city", mu=mu, sigma=sigma,dims='city')
    sigma_cid = pm.HalfCauchy("sigma_city", 10)


    y = pm.Normal('city_mean',mu=mu_cid[adinstid],sigma=sigma_cid,observed=small_df['accidents'])

unpooled model:


with pm.Model(coords=coords) as unpooled_model:
    mu = pm.Normal("mu", 0, sigma=10,dims='city')
    sigma = pm.HalfCauchy("sigma", 10)
    y = pm.Normal('city_mean',mu=mu[city],sigma=sigma,observed=small_df['accidents'])

can someone explain, what am I missing.

In the partial pooled model data flows across groups through those very hyperparameters. For instance, it can learn that most groups behave very similar (translated into a small posterior sigma hyperparameter) which would make it believe that a specific group for which there’s not much data is also likely to be similar to the rest, given the homogeneity across most groups.

This reasoning also extends to new groups out-of-sample. Would you rather start with an uninformative prior for a new group or believe the posterior mean/std of all the in-sample groups to be a better initial guess?

The model can also learn groups are very distinct (translated into a large posterior hyperparameter sigma) and arrive at similar conclusions as the unpooled model. But unlike the unpooled model it is not structurally “destined” to ignore what’s going on with other groups. It can learn to do so if the data argues for it.

The statistical rethinking book has a great chapter with a fish example that illustrates this clearly

Thanks for the reply @ricardoV94
So you mean, even though the plate notations looks similar, there is a information flow across cities in the partial pooling model? Which is not happening in the unpooled model. If i have to visualise the info sharing how can i do that.
Thanks once again, just trying to understand the pymc working.

The easiest way is to model with and without pooling and you’ll see the implied “shrinkage” effects of the groups parameters towards the group mean. The chapter from the book I mentioned is really good here.

In graphical form, the unpooled model has a batch of independent 61 parameters. Once you give them a shared hyperparameter they become related in the posterior, because the hyperparameter has to find the best compromise with all of the individual parameters, and each individual parameter has to find the best compromise with the hyperparameter. Because of this mutual dependency the group parameters end up influencing each other indirectly.

In fact anytime you can link information from two parameters, no matter how separated, there will be information sharing in the posterior (batches of independent dimensions are not linked, but the graphviz does not allow you to distinguish this because it doesn’t show each individual connection).

This video is an excellent introduction to the subject, with a lot of great visualizations.

1 Like

Thanks a lot for the details. It helped me understand the difference.

Just one more question on the model above, as you can see we are defining the “dims”, but when we pass the data frame, we are only using the “accidents” column, how it determines which accident belongs to which city. Because when I try to sample it should give number of samples N for each city but it is not the case.
if we check the posterior predictive it has city_mean_dim_2 as 110. I don’t understand why it’s not equal to number of cities. Am I missing something here ?