Convergence with noncentered hierarchical model

I have an issue where a noncentered hierarchical model suffers from lots of divergences while the centered version works just fine. This is counter to my expectations so I suspect I’m probably implementing the noncentered variables incorrectly. I create a dataset:


idist = scipy.stats.norm(loc=10, scale=2)

sdist = scipy.stats.norm(loc=10, scale=0.5)

noise = scipy.stats.norm(loc=0, scale=0.1)

nsamples = 4

intercepts = idist.rvs(nsamples)
slopes = sdist.rvs(nsamples)

point_min = 2
point_range = 10

npoints = [int(point_range * np.random.rand()) + point_min for _ in range(nsamples)]

density_min = 0.90
density_range = 0.20

densities = [np.sort(density_range * np.random.rand(n) + density_min) for n in npoints]
density_mean = np.mean(np.hstack(densities))
density_std = np.std(np.hstack(densities))

normdensities = [(d - density_mean)/density_std for d in densities]

moduli = [s * (d - 0.5 * density_range + density_min) + i + noise.rvs(d.size) for d, s, i in zip(densities,     slopes, intercepts)]
moduli_mean = np.mean(np.hstack(moduli))
moduli_std  = np.std(np.hstack(moduli))

normmoduli = [(m - moduli_mean)/moduli_std for m in moduli]

So I can specify my centered model as
with pymc3.Model() as noncentered_model:

    m = pymc3.Normal('m', mu=0, sd=1)

    b = pymc3.Normal('b', mu=0, sd=1)

    zb = pymc3.Normal('zb', mu=0, sd=1, shape=(nsamples,))

    bi = pymc3.Deterministic('bi', b + zb)

    noise = pymc3.HalfNormal('noise', sd = 1)

_ynorm = [None for _ in range(nsamples)]

_y = [None for _ in range(nsamples)]

_l = [None for _ in range(nsamples)]

for i in range(nsamples):

    with noncentered_model:
        _ynorm[i] = pymc3.Deterministic(f'ynorm_{i+1}', m * normdensities[i]+ bi[i])
        _y[i] = pymc3.Deterministic(f'y_{i+1}', _ynorm[i] * moduli_std + moduli_mean)
        _l[i] = pymc3.Normal(f'l{i+1}', mu = _ynorm[i], sd=noise, observed = normmoduli[i])

This approach fails with MANY divergences. Specifying bi directly (bi = pymc3.Normal(‘bi’, mu=b, sd=1, shape=(nsamples,))) works great.

Does anyone see where I’m going wrong?

I don’t have the references at hand, but the noncentered parametrization is not universally better, if it were we’d use it everywhere and probably even have tried to have pymc3 use it under the hood even when not specified explicitly. Depending on the model and data you’ll have a funnel geometry and sampling problems when using the noncentered parametrization instead of when using the centered one. In these cases the centered one generally samples efficiently and without problem.

@OriolAbril Okay! From all that I had read I was under the impression that centered approaches were for chumps and that you always wanted to use centered distributions so this is really helpful. Thanks!

Michael Betancourt discusses the trade offs here. Interestingly, in this section Betancourt uses a heuristic criteria to assign parameters with fewer observations (< 25) a non-centered parameterization and the rest a centered parameterization. In his toy example, the mixed parameterization is better suited (i.e. shows no divergences) than the uniformly centered or non-centered parameterizations.

I’m still trying to wrap my head around it myself, but this is something I am trying to apply to my own problem at the moment.


Thanks for that reference @Alex_K (and welcome to the forum!). I’ll go through that carefully.

Thanks for sharing this post! I really wanted to experiment with the mixed-parameterization that Betancourt shows in Stan, but in PyMC3, so I put together this blog post. It’s a bit long, so I copied the model-building function below so you can see how I accomplished it. Did you ever implement this type of model, and if so, how did you code it? I feel like mine is a bit clunky.

def mixed_parameterization(
    d: dict[str, Any],
    cp_idx: np.ndarray,
    ncp_idx: np.ndarray,
    draws: int = 10000,
    tune: int = 1000,
) -> tuple[pm.Model, az.InferenceData, dict[str, np.ndarray]]:
    n_cp = len(np.unique(cp_idx))
    n_ncp = len(np.unique(ncp_idx))
    with pm.Model() as mixp_model:
        mu = pm.Normal("mu", 0, 5)
        tau = pm.HalfNormal("tau", 5)

        theta_cp = pm.Normal("theta_cp", mu, tau, shape=n_cp)
        eta_ncp = pm.Normal("eta_ncp", 0, 1, shape=n_ncp)
        theta_ncp = pm.Deterministic("theta_ncp", mu + tau * eta_ncp)

        _theta = list(range(d["K"]))
        for i, t in enumerate(cp_idx):
            _theta[t] = theta_cp[i]
        for i, t in enumerate(ncp_idx):
            _theta[t] = theta_ncp[i]

        theta = pm.Deterministic("theta", pm.math.stack(_theta))

        y = pm.Normal("y", theta[d["idx"]], d["sigma"], observed=d["y"])

        trace = pm.sample(
        y_post_pred = pm.sample_posterior_predictive(
            trace=trace, random_seed=RANDOM_SEED
    return mixp_model, trace, y_post_pred

For context, d is a data dictionary where "K" is the number of individual groups, cp_idx and ncp_idx are the indices of the individuals that should get a centered and non-centered parameterization, respectively.

I did not try to implement the mixed-parameterization, but I will check out your post! I’m still a relative newcomer to pymc3, so I don’t think I would know the best (or a better) way to do it at this point. I’ll experiment with your code and let you know if I learn anything.