Gelman's (alpha + beta)^-5/2 parametrization for beta prior in AB test

hadron · April 2, 2021, 6:00pm

Dear experts,

I built a hierarchical model for my AB test of click rates, using Gelman’s parametrization (alpha + beta)^-5/2 (see e.q. 5.9 in BDA) for my beta prior. In the book this parametrization seems to be justified with this statement

"In a problem such as this with a reasonably large amount of data, it is possible to set up a
‘noninformative’ hyperprior density that is dominated by the likelihood and yields a proper
posterior distribution."

where he refers to the rat tumor example.
In my case I only have two observations, one for option A and one for option B, so I am aware of this difference wrt the BDA example. Based on this I built the following model:

def logp_ab(value):
    return tt.log(tt.pow(tt.sum(value), -5/2))

with pm.Model() as model:
    ab = pm.HalfFlat('ab', shape=2, testval=np.asarray([1., 1.]))
    pm.Potential('p(a, b)', logp_ab(ab))
    alpha = pm.Deterministic('alpha', ab[0])
    beta = pm.Deterministic('beta', ab[1])
    theta = pm.Beta('theta', alpha=ab[0], beta=ab[1], shape=2) 
    p = pm.Binomial('y', p=theta, observed=[232, 194], n=[7890, 7795])

running this with

pm.sample(draws=8000, tune=5000, chains=4, return_inferencedata=True, cores=4)

yields 8777 divergences, which probably means reparametrizing my model (I tried increasing the target_accept rate and also increase the burn-ins but still gives divergences.
I looked at the distributions of my parameters w/o divergences and w/ divergences, see below, but this doesn’t tell me much to my beginner/untrained eye. Except maybe that the divergences are not clustered around some particular values, which I’d think means that a re-parametrization is required.
What options do I have for ‘weakly informative priors’? E.g. a gamma function for the variance of my beta prior? I am trying to find the ‘weakliest informative prior’ for this case.

chartl · April 2, 2021, 7:08pm

That’s somewhat of an ugly parametrization; using

x = \frac{a}{a+b}
and
y = \sqrt{a+b}

you can sample x and 1/y uniformly, and use the transformation

a = xy^2
b = y^2 - xy^2 = (1-x)y^2

to get an equivalent prior from uniform samples. See Hierarchical AB testing in PyMC3 - #2 by chartl

hadron · April 2, 2021, 8:42pm

thank you @chartl ! That’s a great idea and helps, but I’m still getting divergences, also after restricting the range of x and y.

chartl · April 2, 2021, 8:46pm

Hmm. Can you do a pairplot of x and 1/y (or however you drew the prior)?

hadron · April 2, 2021, 9:37pm

Here it is, with the range [0, 0.1]
pairplot

and here when using [0, 1]
pairplot2

chartl · April 2, 2021, 9:54pm

That looks about right. Have you tried slightly increasing target accept? I don’t get any divergences:


with pm.Model() as model:
    x = pm.Uniform('x', 0., 1.)
    yi = pm.Uniform('yi', 0., 1.)
    y = pm.Deterministic('y', 1./yi)
    a = pm.Deterministic('a', x/yi**2)
    b = pm.Deterministic('b', (1-x)/yi**2)
    theta = pm.Beta('theta', alpha=a, beta=a, shape=2) 
    p = pm.Binomial('yobs', p=theta, observed=[232, 194], n=[7890, 7795]) 
    
    tr = pm.sample(500, tune=1000, chains=6, cores=3, nuts_kwargs=dict(target_accept=0.9))

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (6 chains in 3 jobs)
NUTS: [theta, yi, x]
Sampling 6 chains, 0 divergences: 100%|██████████| 9000/9000 [00:04<00:00, 1848.89draws/s]

hadron · April 2, 2021, 11:27pm

thank you for your quick reply and cross checks @chartl! I forgot this in my model:

Now it runs w/o divergences! However, I’m still not sure why this works: your parametrization has certainly better interpretability, but at the end shouldn’t they both give the same results?
Perhaps it’s related to the obfuscating (at least to me) pm.Potential?
Either way, thanks a ton for your help!

chartl · April 3, 2021, 6:04am

Yes, the likelihoods in terms of (\alpha, \beta) are proportional, so the posteriors should match. The issue is that P(x) \propto x^{-\alpha} is Pareto, and doesn’t have support for values of x < 1 (or some other positive constant), yet it is totally plausible for \alpha + \beta < 1 in a Beta distribution. You can use pm.Bound as in:

ab = pm.Bound(pm.HalfFlat, lower=1)('ab', shape=2, testval=np.asarray([2., 2.]))
pm.Potential('p(a, b)', logp_ab(ab))

and that will help somewhat (and it also makes sense to want \alpha + \beta > 1).

I wound up getting some divergences myself even with the uniform parameterization, which can be dealt with in a similar manner by taking y = pm.Deterministic('y', 1./(yi + eps)). Ultimately you have to do something to make this prior integrable.

Topic		Replies	Views
Hierarchical AB Testing Posterior Delta Questions	2	572	August 4, 2020
Reparametizing Gelman's rat tumour example Questions	7	984	June 26, 2018
There were *divergences after tuning. Increase `target_accept` or reparameterize. & The number of effective samples is smaller than 10% for some parameters Questions	7	2799	November 15, 2021
Hierarchical AB Testing Interpretation Questions	4	1015	January 29, 2021
Hierarchical AB testing in PyMC3 Questions	4	2510	March 2, 2021

Gelman's (alpha + beta)^-5/2 parametrization for beta prior in AB test

Related topics