Partial pooling: Understanding sampling efficiency and "scale dependence"

Benjamin · March 13, 2024, 1:56pm

I haven’t been able to fully wrap my head around the “simpson paradox” example yet. For the data generating process in the example it seems to be quite challenging to set up a straightforward (hierarchical) model that has no divergences/low ESS values for certain variables, cf. discussion in Simpsons paradox and mixed models - large number of groups - #16 by aseyboldt

In order to understand this I have tried to simplify the data generation process as follows. I am simply drawing (univariate) random normal variates having a group-specific mean \mu_g and a variance \sigma^2 (the same for all groups for the sake of simplicity). The group-specific mean is random normally distributed with location (mean) \mu and variance \omega^2.

In other words I have an intra-group variability \sigma and an inter-group variability \omega in my model.

To start with I set \mu=2, \sigma=0.75, and \omega=0.1, i.e. the inter-group variability (of the mean) is significantly smaller than the intra-group variability of observations (around the group mean).

For later experiments I can change num_groups, num_samples_per_group, and omega in my data generating process.

def generate_data(num_groups: int = 5, num_samples_per_group: int = 20, omega: float = 0.1) -> pd.DataFrame:
    rng = np.random.default_rng(seed=41)
    mu = 2.0
    # intra-group varability of observations, same for all groups
    sigma = 0.75
    # inter-group variability in location mu_group
    # omega = 0.1
    result = {"group": [], "observation": []}
    for i in range(num_groups):
        mu_group = rng.normal(loc=mu, scale=omega)
        result["group"] += [i] * num_samples_per_group
        result["observation"] += rng.normal(loc=mu_group, scale=sigma, size=num_samples_per_group).tolist()
    return pd.DataFrame(result)

The hierarchical model has population level variables \mu, \sigma, and \omega. It does not model a group specific variability of observations, i.e. \sigma_g. The group specific mean is modelled in the “standard” way (i.e. essentially copy & paste from the examples above).

def make_model_v1(frame: pd.DataFrame) -> pm.Model:
    """No group-specific volatility in the observation generating process, i.e. 'sigma_group = sigma_population'."""
    coords = {"group": frame["group"].unique()}
    with pm.Model(coords=coords) as model:
        # Data
        y = pm.MutableData("y", frame["observation"])
        g = pm.MutableData("g", frame["group"])

        # Population level parameters
        mu = pm.Normal("mu")
        sigma = pm.HalfNormal("sigma")
        omega = pm.HalfNormal("omega", 10.0)

        # Group level parameters
        epsilon_mu_g = pm.ZeroSumNormal("epsilon_mu_g", dims="group")
        mu_g = pm.Deterministic("mu_g", mu + omega * epsilon_mu_g)

        # Likelihood
        pm.Normal("y_obs", mu=mu_g[g], sigma=sigma, observed=y)
    return model

For omega=0.1 the model seems to behave perfectly reasonable, e.g. increasing num_groups “narrows” the posterior on the true values for \mu, \sigma, and \omega`, cf. table below

	omega (true value)	num_groups	num_samples_per_group	parameter	mean	hdi_3%	hdi_97%	ess_bulk	ess_tail	r_hat
0	0.1	5	20	mu	1.902	1.762	2.024	2382	1913	1
1	0.1	5	20	omega	0.238	0	0.587	883	980	1.01
2	0.1	5	20	sigma	0.706	0.607	0.801	2842	2585	1
3	0.1	50	20	mu	1.985	1.94	2.031	4798	2500	1
4	0.1	50	20	omega	0.071	0	0.137	1266	2026	1
5	0.1	50	20	sigma	0.772	0.739	0.806	4429	2337	1
6	0.1	500	20	mu	1.995	1.98	2.009	8185	2322	1
7	0.1	500	20	omega	0.096	0.07	0.119	1482	1985	1
8	0.1	500	20	sigma	0.751	0.74	0.761	5512	2852	1

One can also observe that the ESS increases with increasing number of groups. My interpretation is that the only evidence for omega is the between group variability and for a small number of groups there is just not much evidence that can be gathered. This in turn leads to a small ESS, especially for omega.

This is somewhat counterintuitive to the “there is no minimum sample size for Bayesian data analysis” statement since the presence of somewhat low ESS leaves me (as someone getting just started) doubtful that I will be able to trust my hierarchical model (due to problems with posterior sampling) in situations where I don’t control the data generating process. I guess I am overlooking something here though, but any input would be very welcome.

Now if I increase omega by a factor of ten and run the same experiment, then I observe that the posterior “hdi” does not contain the true value for 5 and 50 groups and the ESS values for omega drop significantly compared to the ones for the experiment with omega=0.1,

	omega (true value)	num_groups	num_samples_per_group	parameter	mean	hdi_3%	hdi_97%	ess_bulk	ess_tail	r_hat
0	1	5	20	mu	1.746	1.606	1.872	3210	1771	1
1	1	5	20	omega	1.112	0.413	2.077	660	841	1.01
2	1	5	20	sigma	0.706	0.61	0.802	2523	2146	1
3	1	50	20	mu	1.861	1.817	1.909	6323	2901	1
4	1	50	20	omega	0.922	0.737	1.117	501	670	1.01
5	1	50	20	sigma	0.773	0.741	0.807	4813	2549	1
6	1	500	20	mu	1.988	1.975	2.002	9388	2736	1
7	1	500	20	omega	0.991	0.93	1.051	264	665	1
8	1	500	20	sigma	0.75	0.741	0.761	7152	2465	1

For omega=10, the results look as follows

	omega (true value)	num_groups	num_samples_per_group	parameter	mean	hdi_3%	hdi_97%	ess_bulk	ess_tail	r_hat
0	10	5	20	mu	0.149	0.018	0.288	1905	1915	1
1	10	5	20	omega	8.656	3.897	13.736	594	879	1.01
2	10	5	20	sigma	0.705	0.615	0.802	1541	1921	1
3	10	50	20	mu	0.623	0.578	0.672	3588	2597	1
4	10	50	20	omega	8.89	7.173	10.57	132	127	1.03
5	10	50	20	sigma	0.772	0.741	0.804	3405	2963	1
6	10	500	20	mu	1.922	1.909	1.936	939	997	1.01
7	10	500	20	omega	9.888	9.433	10.931	5	12	2.11
8	10	500	20	sigma	0.75	0.741	0.761	414	853	1

I am not entirely sure what I should make of this. I could say that the standard error of the estimate for mu is essentially omega / sqrt(num_groups) and the deviation between the posterior mean and the true value seems to be compatible with that number. Also if omega is larger than sigma does it make sense to talk about a (group-level) random effect, there is not much that those groups are sharing (in terms of the average of the observations within each group).

I’d be glad for any suggestions how to reconcile this intuition with the observed posterior summaries.

Topic		Replies	Views
Simpsons paradox and mixed models - large number of groups v5	18	419	March 14, 2024
Basic population and group level estimation example: suggestions for improvements? version agnostic model-checking	6	623	April 8, 2023
Feedback on a Hierarchical Model for Learning Shared Coefficient of Variation v5 modeling , hierarchical	4	234	February 29, 2024
Reparameterization of 3-level hierarchical bayesian model Questions	9	2289	July 12, 2022
Problems with posteriors of parameters in Bayesian hierarchical model (partial pooling) v5	0	237	August 8, 2023

Partial pooling: Understanding sampling efficiency and "scale dependence"

Related topics