Non-centered parameterization of a beta distribution?

@twiecki wrote a great description of non-centering a hierarchical model, both why the sampler sometimes has difficulty exploring the space and what to do about it. Diagnosing Biased Inference with Divergences is also a good description, showing tools for diagnosing divergences, and prescribing non-centering to avoid them. And Richard McElreath describes an amazingly simple model—which he dubs the Devil’s Funnel—that exhibits divergent transitions, fixable by reparameterizing to a non-centered representation (2e: page 421-423).

But all three of dsecriptions of non-centering share common feature: the variable that is reparameterized takes a normal distribution.

What if the problem variable is a beta instead? Is there a way to non-center a beta distribution?

For example, consider the following variation of McElreath’s Devil’s Funnel:

import pymc3 as pm
import numpy as np
import arviz as az

max_sigma = 0.25
with pm.Model() as lucifers_funnel:
    v = pm.HalfNormal('v', 3.0)
    x = pm.Beta('x', mu=0.5, sigma=max_sigma/pm.math.exp(v))

with lucifers_funnel:
    trace = pm.sample(random_seed=19950526)

This Lucifer’s Funnel is a lot like McElreath’s Devil’s Funnel. And as with Devil’s Funnel, Lucifer’s Funnel exhibits many divergences.

Lucifer’s Funnel desperately needs non-centering, to somehow move v out of pm.Beta(), and to redefine x with a pm.Deterministic(). But how? Is there a transformation that accomplishes non-centering for this beta?


Your model is not hierarchical right? That’s what the non centered parametrization is usually needed for.

Isn’t your issue simply that sigma can easily get too close to zero? That would indicate you needed a more constrained prior on v, or model sigma directly.

Also did you try changing target accept or tune iterations? That can be a reasonable approach when you have a tough posterior.

To answer your original question though. I am not familiar with a way to parametrize the beta in a non centered way. You can’t simply multiply by the sigma and add a mean because it must be bound between 0 and 1. You can parametrize in terms of a Dirichlet times a concentration, but that is not really the same. The closest would be perhaps to drop the beta entirely and model it with a logitnormal: Logit-normal distribution - Wikipedia

1 Like

My understanding is Lucifer’s Funnel is in fact a hierarchical model. At least it is a variation of the McElreath’s Devil’s Funnel, described in a chapter devoted to hierarchical models.

(If this is a copyright violation, many apologies. Buy the book. It’s useful, well-written, and funny. I literally laughed out loud while reading it, frightening those around me.)

To answer your question, I can reduce the divergences by increasing target accept, both in this toy model and in my actual model. But I really would like to change the difficult geometry that causes the divergences, to remove the divergences entirely.

Approximate the beta with a logitnormal? I’ll look into that. But it seems a heavy-handed approach if there is some way to reparameterize the beta into some form that is mathematically identical but numerically different. Assuming such a form exists.

I don’t think this is so much a funnel as a problem with the boundary condition that \sigma^2 < \mu(1-\mu). Even though you have the max sigma, your HalfNormal can still attempt to sample values near 0; and thus near the boundary.

Edit: It’s probably not even that. My guess is that the gradient of 1/\exp(v) blows up somewhere and generates the divergence; but with the following parameterization (which is logistic in the transformed variable) the gradients are better behaved and don’t cause the same kinds of problems.

Using the fully independent parameterization of:

\alpha = \frac{\mu}{\gamma}
\beta = \frac{1-\mu}{\gamma}

whereby \sigma^2 = \frac{\gamma}{1 + \gamma}\mu(1-\mu)

does not generate such divergences:

import pymc3 as pm
import numpy as np
import arviz as az

max_sigma = 0.25
with pm.Model() as lucifers_funnel:
    loggamma = pm.Normal('lgamma', 3.0)
    gamma = pm.Deterministic('gamma', pm.math.exp(loggamma)/(1 + pm.math.exp(loggamma)))
    x = pm.Beta('x', mu=0.5, sigma=gamma * 0.5 * (1-0.5))

with lucifers_funnel:
    trace = pm.sample(random_seed=19950526)NUTS: [x, lgamma]
Sampling 4 chains, 0 divergences: 100%|██████████| 4000/4000 [00:00<00:00, 7567.42draws/s]

@chartl that’s quite interesting. Out of curiosity, do you see any advantage in an approach like this compared to just using a logitnormal for controlling both mu and sigma (i.e. discarding the beta distribution altogether)?

Depends on the setting. You can recognize the factor \gamma/(1+\gamma) as just an inflation factor over the expected Binomial variance; and in genetics the parameter \gamma (or sometimes 1-\gamma or sometimes the whole ratio) is known as F_{\mathrm{ST}} and reflects the variability of allele frequencies across sub-populations. It is a common target for Bayesian inference.

I suppose that in a logitnormal model, \sigma^2 could serve the same purpose; but in this case it would reflect some latent-scale variance unlinked to an “excess over the binomial variance”; so it may be less clear to interpret.

Also if one observes Binomial counts, it’s convenient to stick with a conjugate distribution.


Nice one, thanks @chartl - I might try this parameterisation in future

@chartl’s gamma-sigma parameterization also samples well in my actual model, not just the toy example worked here.


You can thank Sewall Wright for this one, I think. :slight_smile:

When I see him, I will thank him. Hopefully not soon.

While I was searching for something completely unrelated, I came across this old thread about beta parametrization, and it links through to “implicit normalization of gradients”:

1 Like

I’m still puzzling my way through the math in that paper. But implementing their approach would seem to require making changes to PyMC3, rather than just reparameterizing the model. (Or perhaps I am misunderstanding.)