Non-centered parameterization of a beta distribution?

bridgeland · March 2, 2021, 12:09am

@twiecki wrote a great description of non-centering a hierarchical model, both why the sampler sometimes has difficulty exploring the space and what to do about it. Diagnosing Biased Inference with Divergences is also a good description, showing tools for diagnosing divergences, and prescribing non-centering to avoid them. And Richard McElreath describes an amazingly simple model—which he dubs the Devil’s Funnel—that exhibits divergent transitions, fixable by reparameterizing to a non-centered representation (2e: page 421-423).

But all three of dsecriptions of non-centering share common feature: the variable that is reparameterized takes a normal distribution.

What if the problem variable is a beta instead? Is there a way to non-center a beta distribution?

For example, consider the following variation of McElreath’s Devil’s Funnel:

import pymc3 as pm
import numpy as np
import arviz as az

max_sigma = 0.25
with pm.Model() as lucifers_funnel:
    v = pm.HalfNormal('v', 3.0)
    x = pm.Beta('x', mu=0.5, sigma=max_sigma/pm.math.exp(v))

with lucifers_funnel:
    trace = pm.sample(random_seed=19950526)

This Lucifer’s Funnel is a lot like McElreath’s Devil’s Funnel. And as with Devil’s Funnel, Lucifer’s Funnel exhibits many divergences.

Lucifer’s Funnel desperately needs non-centering, to somehow move v out of pm.Beta(), and to redefine x with a pm.Deterministic(). But how? Is there a transformation that accomplishes non-centering for this beta?

ricardoV94 · March 2, 2021, 6:59am

Your model is not hierarchical right? ~~That’s what the non centered parametrization is usually needed for.~~

Isn’t your issue simply that sigma can easily get too close to zero? That would indicate you needed a more constrained prior on v, or model sigma directly.

Also did you try changing target accept or tune iterations? That can be a reasonable approach when you have a tough posterior.

ricardoV94 · March 2, 2021, 7:09am

To answer your original question though. I am not familiar with a way to parametrize the beta in a non centered way. You can’t simply multiply by the sigma and add a mean because it must be bound between 0 and 1. You can parametrize in terms of a Dirichlet times a concentration, but that is not really the same. The closest would be perhaps to drop the beta entirely and model it with a logitnormal: Logit-normal distribution - Wikipedia

bridgeland · March 2, 2021, 9:01pm

My understanding is Lucifer’s Funnel is in fact a hierarchical model. At least it is a variation of the McElreath’s Devil’s Funnel, described in a chapter devoted to hierarchical models.

(If this is a copyright violation, many apologies. Buy the book. It’s useful, well-written, and funny. I literally laughed out loud while reading it, frightening those around me.)

To answer your question, I can reduce the divergences by increasing target accept, both in this toy model and in my actual model. But I really would like to change the difficult geometry that causes the divergences, to remove the divergences entirely.

Approximate the beta with a logitnormal? I’ll look into that. But it seems a heavy-handed approach if there is some way to reparameterize the beta into some form that is mathematically identical but numerically different. Assuming such a form exists.

chartl · March 5, 2021, 5:18am

I don’t think this is so much a funnel as a problem with the boundary condition that \sigma^2 < \mu(1-\mu). Even though you have the max sigma, your HalfNormal can still attempt to sample values near 0; and thus near the boundary.

Edit: It’s probably not even that. My guess is that the gradient of 1/\exp(v) blows up somewhere and generates the divergence; but with the following parameterization (which is logistic in the transformed variable) the gradients are better behaved and don’t cause the same kinds of problems.

Using the fully independent parameterization of:

\alpha = \frac{\mu}{\gamma}
\beta = \frac{1-\mu}{\gamma}

whereby \sigma^2 = \frac{\gamma}{1 + \gamma}\mu(1-\mu)

does not generate such divergences:

import pymc3 as pm
import numpy as np
import arviz as az

max_sigma = 0.25
with pm.Model() as lucifers_funnel:
    loggamma = pm.Normal('lgamma', 3.0)
    gamma = pm.Deterministic('gamma', pm.math.exp(loggamma)/(1 + pm.math.exp(loggamma)))
    x = pm.Beta('x', mu=0.5, sigma=gamma * 0.5 * (1-0.5))

with lucifers_funnel:
    trace = pm.sample(random_seed=19950526)NUTS: [x, lgamma]
Sampling 4 chains, 0 divergences: 100%|██████████| 4000/4000 [00:00<00:00, 7567.42draws/s]

ricardoV94 · March 5, 2021, 6:04am

@chartl that’s quite interesting. Out of curiosity, do you see any advantage in an approach like this compared to just using a logitnormal for controlling both mu and sigma (i.e. discarding the beta distribution altogether)?

chartl · March 5, 2021, 9:40pm

Depends on the setting. You can recognize the factor \gamma/(1+\gamma) as just an inflation factor over the expected Binomial variance; and in genetics the parameter \gamma (or sometimes 1-\gamma or sometimes the whole ratio) is known as F_{\mathrm{ST}} and reflects the variability of allele frequencies across sub-populations. It is a common target for Bayesian inference.

I suppose that in a logitnormal model, \sigma^2 could serve the same purpose; but in this case it would reflect some latent-scale variance unlinked to an “excess over the binomial variance”; so it may be less clear to interpret.

Also if one observes Binomial counts, it’s convenient to stick with a conjugate distribution.

jonsedar · March 8, 2021, 5:23am

Nice one, thanks @chartl - I might try this parameterisation in future

bridgeland · March 12, 2021, 3:47pm

@chartl’s gamma-sigma parameterization also samples well in my actual model, not just the toy example worked here.

chartl · March 12, 2021, 6:01pm

You can thank Sewall Wright for this one, I think.

bridgeland · March 12, 2021, 10:03pm

When I see him, I will thank him. Hopefully not soon.

chartl · March 18, 2021, 2:46pm

While I was searching for something completely unrelated, I came across this old thread about beta parametrization, and it links through to “implicit normalization of gradients”:

bridgeland · March 22, 2021, 7:43pm

I’m still puzzling my way through the math in that paper. But implementing their approach would seem to require making changes to PyMC3, rather than just reparameterizing the model. (Or perhaps I am misunderstanding.)

Topic		Replies	Views
Parametrize mixed beta regression in non centered way Questions	6	1179	April 17, 2018
Convergence with noncentered hierarchical model Questions	6	813	July 24, 2021
Non-centered multivariate normal Questions	7	1311	January 18, 2024
Ideas for reparameterizing models/changing priors to avoid divergences Questions	3	939	December 5, 2019
Non-centered parametrisation of bounded variables version agnostic modeling	4	337	January 31, 2024

Non-centered parameterization of a beta distribution?

Related topics