Feedback on a Hierarchical Model for Learning Shared Coefficient of Variation

Your approach looks reasonable to me. I am not an expert but one use of hierarchical models is to remedy biases in subgroups if you have by bad luck came up with a particularly bad sample set for that subgroup (i.e a model more robust against outliers). So its effect could be more significant for lower sample numbers. There is one thing that looks like could be modelled more efficiently though:

np.random.normal(means[i], means[i] * CoVs[i])

I assume you use this as such because each group’s data has a different natural scale but presumably you could try rescaling each by subtracting sample mean and dividing by sample std and then apply the inverse transformation to your estimates. Another alternative could be using lognormal instead of normal where the amount of sd will naturally depend on mu. Also are you generating the data like this because that is what you suspect the model is like or do you have good reason to think it works that way? If the former you might just leave the log sigma as a free parameter such as

log_group_cov = pm.Exponential("log_group_cov", 1, dims="groups")
group_cov = pm.math.exp(log_group_cov)

And if you still want to recover the natural scale by sigma you can sample on the fly by something like:

natural_group_cov = pm.Determinisitic("natural_group_cov", group_cov/group_mean)

Perhaps other folks could have some more suggestions on this. When you say

correct supports for the prior distributions and prevent double counting of mean uncertainty in the posterior predictive model as the standard deviation will be a function of mean and CoV estimate

by double counting of mean, do you also mean using mean in both mean and sd? In that case some of the suggestions I said above could be helpful.

Anyway, you can try some of the suggestions above but I took a shot at your model. I have modified your code slightly so that simulated CoV variables also come from a distribution that looks like your model. Moreover I also modified the prior for the means making it lognormal (assuming it is always positive), then it can be expressed with a more natural scale and you won’t need the abs() for group_mean*group_CoV. Moreover, I have also added an extra parameter cov_sigma. The model is here


import numpy as np
import pymc as pm
import arviz as az

seed=10
rng = np.random.default_rng(seed)

cov_sigma_real = 1
cov_mu_real = 0.05
means_real = [20, 150, 78, 65]
CoVs = rng.lognormal(np.log(cov_mu_real), cov_sigma_real, size=4)
samples = 50

group_vals = np.zeros((samples, len(means_real)))
group_labels = np.zeros((samples, len(means_real)), dtype=int)

group = group_labels.T.flatten()
val = group_vals.T.flatten()
coords = {"groups": [0,1,2,3]}


with pm.Model(coords=coords) as HCoV_model:
    group_idx = pm.MutableData("group_idx", group, dims='obs_id')
    cov_sigma = pm.Exponential("cov_sigma", 1)
    CoV_global = pm.LogNormal("global_CoV", mu=-3.0, sigma=1)

    group_CoV = pm.LogNormal("group_CoV", mu=np.log(CoV_global),
                             sigma=cov_sigma,
                             dims="groups")

    group_mean = pm.LogNormal("group_mean", mu=np.log(80), sigma=2, dims="groups")

    sigma = pm.Deterministic("sigma", group_mean*group_CoV)

    y_obs = group_mean[group_idx]
    sigma_obs = sigma[group_idx]


    y = pm.Normal("y", mu=y_obs, sigma=sigma_obs, observed=val, dims="obs_id")

    trace = pm.sample(tune=5000, draws=5000, random_seed=42, target_accept=0.95)

with HCoV_model:
    ppc = pm.sample_posterior_predictive(trace)#, predictions=True)


az.plot_ppc(ppc)

ax = az.plot_posterior(trace, var_names = ["cov_sigma", "global_CoV",
                                           "group_CoV", "group_mean"])
ax = ax.flatten()
ax[0].scatter(cov_sigma_real, 0, marker='o', color='red')
ax[1].scatter(cov_mu_real, 0, marker='o', color='red')

for i in range(4):
  ax[i+2].scatter(CoVs[i], 0, marker='o', color='red')
  ax[i+6].scatter(means_real[i], 0, marker='o', color='red')

ax[0].get_figure().tight_layout()

The posteriors and ppc looks fine (posterior estimates become more efficient as N increases but anyway the HDI seems to cover the true parameters). Posterior predictive plot also looks reasonable. One thing I have notice is you do get about ~200 divergences in 40000 samples even with target_accept=0.95. So perhaps re parametrizing could help there too.

ppc

Also if you want to see more examples on hierarchical models, chapter 9 of John Kruschke - Doing Bayesian Data Analysis is good. There are also these in pymc documents:

https://allendowney.github.io/BayesianInferencePyMC/04_hierarchical.html

2 Likes