Hi James,
I think you need to check the group mean to understand how shrinkage works in your model. But in your parametrization it’s not clear to me what the group mean is – probably a combination of alpha and beta. I think I’d use another parametrization to have a clear group mean. Something like:
with pm.Model() as model:
a = pm.Normal('a', 0., 1.) # group mean
sigma = pm.Exponential('sigma', 1.) # determines amount of shrinkage
a_cluster = pm.Normal('a_cluster', a, sigma, shape=2)
p = pm.math.invlogit(a_cluster[cluster_id])
pm.Binomial('obs', p=p, n=[6, 125], observed=[1, 110])
That way, it’s clear that a is the group mean, which individual-level parameters are shrunk to, and sigma is the amount of shrinkage (the smaller, the more similar the clusters, so the more shrinkage).
That being said, I think your model does shrink towards the group mean: the blue distribution is centered around 0.3, while the observed data are 0.16 (1/6). Of course, the orange one is centered on the observed data and much less wide because there is a lot more data to infer it.
If you’re looking for ressources, there is a whole chapter on hierarchical models and shrinkage in McElreath’s Rethinking – here is the port to PyMC (we’re working on porting the 2nd ed. but you’ll have to wait a bit
). And these models are the topic of the last episode of my podcast, with Thomas Wiecki.
Hope this helps 