I haven’t been able to fully wrap my head around the “simpson paradox” example yet. For the data generating process in the example it seems to be quite challenging to set up a straightforward (hierarchical) model that has no divergences/low ESS values for certain variables, cf. discussion in Simpsons paradox and mixed models - large number of groups - #16 by aseyboldt

In order to understand this I have tried to simplify the data generation process as follows. I am simply drawing (univariate) random normal variates having a group-specific mean \mu_g and a variance \sigma^2 (the same for all groups for the sake of simplicity). The group-specific mean is random normally distributed with location (mean) \mu and variance \omega^2.

In other words I have an intra-group variability \sigma and an inter-group variability \omega in my model.

To start with I set \mu=2, \sigma=0.75, and \omega=0.1, i.e. the inter-group variability (of the mean) is significantly smaller than the intra-group variability of observations (around the group mean).

For later experiments I can change `num_groups`

, `num_samples_per_group`

, and `omega`

in my data generating process.

```
def generate_data(num_groups: int = 5, num_samples_per_group: int = 20, omega: float = 0.1) -> pd.DataFrame:
rng = np.random.default_rng(seed=41)
mu = 2.0
# intra-group varability of observations, same for all groups
sigma = 0.75
# inter-group variability in location mu_group
# omega = 0.1
result = {"group": [], "observation": []}
for i in range(num_groups):
mu_group = rng.normal(loc=mu, scale=omega)
result["group"] += [i] * num_samples_per_group
result["observation"] += rng.normal(loc=mu_group, scale=sigma, size=num_samples_per_group).tolist()
return pd.DataFrame(result)
```

The hierarchical model has population level variables \mu, \sigma, and \omega. It does not model a group specific variability of observations, i.e. `\sigma_g`

. The group specific mean is modelled in the “standard” way (i.e. essentially copy & paste from the examples above).

```
def make_model_v1(frame: pd.DataFrame) -> pm.Model:
"""No group-specific volatility in the observation generating process, i.e. 'sigma_group = sigma_population'."""
coords = {"group": frame["group"].unique()}
with pm.Model(coords=coords) as model:
# Data
y = pm.MutableData("y", frame["observation"])
g = pm.MutableData("g", frame["group"])
# Population level parameters
mu = pm.Normal("mu")
sigma = pm.HalfNormal("sigma")
omega = pm.HalfNormal("omega", 10.0)
# Group level parameters
epsilon_mu_g = pm.ZeroSumNormal("epsilon_mu_g", dims="group")
mu_g = pm.Deterministic("mu_g", mu + omega * epsilon_mu_g)
# Likelihood
pm.Normal("y_obs", mu=mu_g[g], sigma=sigma, observed=y)
return model
```

For `omega=0.1`

the model seems to behave perfectly reasonable, e.g. increasing `num_groups`

“narrows” the posterior on the true values for \mu, \sigma, and \omega`, cf. table below

omega (true value) | num_groups | num_samples_per_group | parameter | mean | hdi_3% | hdi_97% | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|---|

0 | 0.1 | 5 | 20 | mu | 1.902 | 1.762 | 2.024 | 2382 | 1913 | 1 |

1 | 0.1 | 5 | 20 | omega | 0.238 | 0 | 0.587 | 883 | 980 | 1.01 |

2 | 0.1 | 5 | 20 | sigma | 0.706 | 0.607 | 0.801 | 2842 | 2585 | 1 |

3 | 0.1 | 50 | 20 | mu | 1.985 | 1.94 | 2.031 | 4798 | 2500 | 1 |

4 | 0.1 | 50 | 20 | omega | 0.071 | 0 | 0.137 | 1266 | 2026 | 1 |

5 | 0.1 | 50 | 20 | sigma | 0.772 | 0.739 | 0.806 | 4429 | 2337 | 1 |

6 | 0.1 | 500 | 20 | mu | 1.995 | 1.98 | 2.009 | 8185 | 2322 | 1 |

7 | 0.1 | 500 | 20 | omega | 0.096 | 0.07 | 0.119 | 1482 | 1985 | 1 |

8 | 0.1 | 500 | 20 | sigma | 0.751 | 0.74 | 0.761 | 5512 | 2852 | 1 |

One can also observe that the ESS increases with increasing number of groups. My interpretation is that the only evidence for `omega`

is the between group variability and for a small number of groups there is just not much evidence that can be gathered. This in turn leads to a small ESS, especially for `omega`

.

This is somewhat counterintuitive to the “there is no minimum sample size for Bayesian data analysis” statement since the presence of somewhat low ESS leaves me (as someone getting just started) doubtful that I will be able to trust my hierarchical model (due to problems with posterior sampling) in situations where I don’t control the data generating process. I guess I am overlooking something here though, but any input would be very welcome.

Now if I increase `omega`

by a factor of ten and run the same experiment, then I observe that the posterior “hdi” does not contain the true value for 5 and 50 groups and the ESS values for omega drop significantly compared to the ones for the experiment with `omega=0.1`

,

omega (true value) | num_groups | num_samples_per_group | parameter | mean | hdi_3% | hdi_97% | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|---|

0 | 1 | 5 | 20 | mu | 1.746 | 1.606 | 1.872 | 3210 | 1771 | 1 |

1 | 1 | 5 | 20 | omega | 1.112 | 0.413 | 2.077 | 660 | 841 | 1.01 |

2 | 1 | 5 | 20 | sigma | 0.706 | 0.61 | 0.802 | 2523 | 2146 | 1 |

3 | 1 | 50 | 20 | mu | 1.861 | 1.817 | 1.909 | 6323 | 2901 | 1 |

4 | 1 | 50 | 20 | omega | 0.922 | 0.737 | 1.117 | 501 | 670 | 1.01 |

5 | 1 | 50 | 20 | sigma | 0.773 | 0.741 | 0.807 | 4813 | 2549 | 1 |

6 | 1 | 500 | 20 | mu | 1.988 | 1.975 | 2.002 | 9388 | 2736 | 1 |

7 | 1 | 500 | 20 | omega | 0.991 | 0.93 | 1.051 | 264 | 665 | 1 |

8 | 1 | 500 | 20 | sigma | 0.75 | 0.741 | 0.761 | 7152 | 2465 | 1 |

For `omega=10`

, the results look as follows

omega (true value) | num_groups | num_samples_per_group | parameter | mean | hdi_3% | hdi_97% | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|---|

0 | 10 | 5 | 20 | mu | 0.149 | 0.018 | 0.288 | 1905 | 1915 | 1 |

1 | 10 | 5 | 20 | omega | 8.656 | 3.897 | 13.736 | 594 | 879 | 1.01 |

2 | 10 | 5 | 20 | sigma | 0.705 | 0.615 | 0.802 | 1541 | 1921 | 1 |

3 | 10 | 50 | 20 | mu | 0.623 | 0.578 | 0.672 | 3588 | 2597 | 1 |

4 | 10 | 50 | 20 | omega | 8.89 | 7.173 | 10.57 | 132 | 127 | 1.03 |

5 | 10 | 50 | 20 | sigma | 0.772 | 0.741 | 0.804 | 3405 | 2963 | 1 |

6 | 10 | 500 | 20 | mu | 1.922 | 1.909 | 1.936 | 939 | 997 | 1.01 |

7 | 10 | 500 | 20 | omega | 9.888 | 9.433 | 10.931 | 5 | 12 | 2.11 |

8 | 10 | 500 | 20 | sigma | 0.75 | 0.741 | 0.761 | 414 | 853 | 1 |

I am not entirely sure what I should make of this. I could say that the standard error of the estimate for `mu`

is essentially `omega / sqrt(num_groups)`

and the deviation between the posterior mean and the true value seems to be compatible with that number. Also if `omega`

is larger than `sigma`

does it make sense to talk about a (group-level) random effect, there is not much that those groups are sharing (in terms of the average of the observations within each group).

I’d be glad for any suggestions how to reconcile this intuition with the observed posterior summaries.