Hierarchical AB Testing Interpretation

Zappageck · January 27, 2021, 6:15pm

Hi

I’ve got two groups (A & B) with multiple studies (in this case two each), which I want to compare but do not want to pool the studies.
Therefore I created a hierarchical model based on the tipps from this thread (Hierarchical AB testing in PyMC3)
However I’m not sure how to draw a good conclusion, because I get multiple ‘deltaABX’ (one for each study)

How can I get an ‘overall’ (one) estimation of the delta of the rates of A and B?

Example code:

import pymc3 as pm
import theano.tensor as tt
import numpy as np

trials_A=np.array([ 96, 185])
successes_A=np.array([25, 36])
trials_B=np.array([216, 100])
successes_B=np.array([52, 19])

def logp_ab(value):
    return tt.log(tt.pow(tt.sum(value), -5/2))

with pm.Model() as model2:

    # Uninformative prior for alpha and beta (Group A)

    ab_A = pm.HalfFlat('ab_A', shape=2, testval=np.asarray([1., 1.]))
    pm.Potential('p(a, b)_A', logp_ab(ab_A))

    

    # # Our alpha and beta. Remove this code if you don't want to track alpha
    # # and beta
    # alpha_A = pm.Deterministic('alpha_A', ab_A[0])
    # beta_A = pm.Deterministic('beta_A', ab_A[1])

    theta_A = pm.Beta('theta_A', alpha=ab_A[0], beta=ab_A[1], shape=trials_A.shape[0]) 
    p_A = pm.Binomial('y_A', p=theta_A, observed=successes_A, n=trials_A) 

    # the same as above for second group (B)
    ab_B = pm.HalfFlat('ab_B', shape=2, testval=np.asarray([1., 1.]))
    pm.Potential('p(a, b)_B', logp_ab(ab_B))

    # alpha_B = pm.Deterministic('alpha_B', ab_B[0])
    # beta_B = pm.Deterministic('beta_B', ab_B[1])
    theta_B = pm.Beta('theta_B', alpha=ab_B[0], beta=ab_B[1], shape=trials_B.shape[0]) 

    p_B = pm.Binomial('y_B', p=theta_B, observed=successes_B, n=trials_B) 
    deltaABX = pm.Deterministic('deltaABX',theta_B - theta_A) # for each theta

    trace2 = pm.sample(draws=5_000, tune=5_000)


c_interval=0.95
pm.plot_posterior(trace2,credible_interval=c_interval, point_estimate='mean', ref_val=0.0) 
pm.summary(trace2,credible_interval=c_interval)

nkaimcaudle · January 28, 2021, 1:27am

One option is to link the two studies within each group.
Below I assume the difference between the two studies is the same within each group (in logistic space)

with pm.Model() as model3:
    # Uninformative prior for alpha and beta (Group A)
    #ab_A = pm.HalfNormal('ab_A', 5., shape=2, testval=np.asarray([1., 1.]))
    ab = pm.HalfNormal('ab', 5., shape=2, testval=np.asarray([1., 1.]))
    pm.Potential('p(a, b)', logp_ab(ab))
    
    study_diff = pm.Normal('study_diff', 0, 1)
    
    theta_A_one = pm.Beta('theta_A_one', alpha=ab[0], beta=ab[1])
    theta_A_two = pm.math.sigmoid(pm.math.logit(theta_A_one) + study_diff)
    theta_A = pm.Deterministic('theta_A', tt.stack((theta_A_one, theta_A_two)))
    p_A = pm.Binomial('y_A', p=theta_A, observed=successes_A, n=trials_A) 

    # the same as above for second group (B)
    #ab_B = pm.HalfNormal('ab_B', 5., shape=2, testval=np.asarray([1., 1.]))
    #pm.Potential('p(a, b)_B', logp_ab(ab_B))

    #theta_B = pm.Beta('theta_B', alpha=ab_B[0], beta=ab_B[1], shape=trials_B.shape[0]) 
    theta_B_one = pm.Beta('theta_B_one', alpha=ab[0], beta=ab[1])
    theta_B_two = pm.math.sigmoid(pm.math.logit(theta_B_one) + study_diff)
    theta_B = pm.Deterministic('theta_B', tt.stack((theta_B_one, theta_B_two)))

    p_B = pm.Binomial('y_B', p=theta_B, observed=successes_B, n=trials_B) 
    deltaABX = pm.Deterministic('deltaABX', theta_B_one - theta_A_one) # for each theta

    trace2 = pm.sample(draws=5_000, tune=8_000, target_accept=0.95, return_inferencedata=True)
az.summary(trace2)

Zappageck · January 28, 2021, 4:13pm

Thanks for your reply!

I have some diffulty understanding everything you did here. Partly because the code does not run and I cannot observe the output:
“function() got an unexpected keyword argument ‘return_inferencedata’”

Also I do not understand what exactly happens here. Maybe you can help me by clarifying a bit what you propose and why?
The studies within one group are already connected by the shared (for each group) parameters alpha and beta. While you are introducing truly global shared alpha and betas. What was your thought process concerning this?
Why is the same ‘study_diff’ used for the different groups A and B?
How can I extend this approach if I had more studies in group A than in B?

nkaimcaudle · January 29, 2021, 1:41am

You are likely running an out dated version of PyMC3. I recommend that you update it.

My code isn’t necessarily the “correct” or “best” method, but it is one that allows you to have a single delta of rates between A and B.

In my model the global “ab” will impact each group’s first study. Then the global “study_diff” will be added to get the second study, the same diff being used in both groups. This approach would not work when you have groups with differing amount of studies. However in that case the concept of AB testing doesn’t apply. It can work for when you have more than 2 studies within all groups.

cluhmann · January 29, 2021, 2:54am

As @nkaimcaudle describes, you are likely using an older version of pymc3. Unless you are running an even older version, you are likely to be seeing warnings about pm.sample() returning an object of type inferencedata in future versions.

Regarding the interpretation of your original model, you seem to want to say something about group differences and are struggling because the thetas live at the study level (which is why you have 2 per group). So it sounds like what you really want to do is interpret the alphas and the betas describing the beta distribution from which the thetas are generated rather than the thetas themselves. This is often the case when using hierarchical models. The lower levels exist because we expect there to be random variation across “instances” (studies here, but it could be anything), but often want to abstract away from the specific instances we observed in our data when we try to draw general/generalizable inferences.

One way to do this would be to determine the expected value of theta for each credible value of alpha and beta (E[\theta]=\frac{\alpha}{\alpha + \beta}). Something like this:

ExpectTheta_A = pm.Deterministic(
                    'ExpectTheta_A',
                    (ab_A[0] / (ab_A[0] + ab_A[1]))
                )
ExpectTheta_B = pm.Deterministic(
                    'ExpectTheta_B',
                    (ab_B[0] / (ab_B[0] + ab_B[1]))
                )

Then you could take the difference between these values and interpret them as your group differences.

But I don’t necessary like this solution because it hides/ignores the uncertainty we have about the distribution of thetas we expect from individual studies within a single group (though this may exactly what you care about). If you are instead looking to say something about the expected differences within a pair of studies, each involving a separate group, then I might suggest something like what is below. It requires you to get your hands a bit dirtier, but allows you to really dig in and extract whatever insights you want.

import scipy.stats

# how many posterior draws to pull
n_ppc_draws = 100
ppcs_idx = np.random.choice(range(len(trace2)), 
                            size=n_ppc_draws)

# retrieve credible alpha and beta values for group A
alpha_A = trace2['ab_A'][ppcs_idx,0]
beta_A = trace2['ab_A'][ppcs_idx,1]
# generate credible theta values expected from
# *arbitrary studies* run with this group
# NOT just the 2 we happened to observe
# but the "universe" of studies we could possibly run
theta_A = scipy.stats.beta(alpha_A, beta_A).rvs()

Note that we could do this step of sampling multiple times per alpha_A/beta_A values, because each new (arbitrary) study is expected to yield new thetas, even for the same group. Alternatively, we could just take the group’s expected value of theta, like we did above:
ExpectTheta_A = alpha_A / (alpha_A + beta_A)

But be careful, no single study is expected to reflect this value of theta! This is just what we expect on average from studies run in/on this group. Individual studies will vary. How much? Well, the variance would be \frac{\alpha \beta}{(\alpha+\beta)^2(\alpha+\beta +1)} or:
(alpha_A * beta_A) / ( np.square(alpha_A + beta_A) * (alpha_A + beta_A + 1) )

Regardless, we can then do the same for group B.

alpha_B = trace2['ab_B'][ppcs_idx,0]
beta_B = trace2['ab_B'][ppcs_idx,1]
theta_B = scipy.stats.beta(alpha_B, beta_B).rvs()

So now what? Presumably we want to know something about the group-wise differences in the credible thetas.

# average difference in thetas from the 2 groups
np.mean(theta_A - theta_B)

# stdev of difference in thetas from the 2 groups
np.std(theta_A - theta_B)

Finally, I am getting loads of divergences when I run this model. I suspect it is related to the use of the half flat prior on the alphas and betas. It seems unlikely that you are really this uncertain and could probably do with a tighter prior.

Topic		Replies	Views
Hierarchical AB Testing Posterior Delta Questions	2	570	August 4, 2020
Hierarchical AB testing in PyMC3 Questions	4	2500	March 2, 2021
Best Practice AB Test Difference in Means Questions	2	594	February 14, 2019
Bayesian Hierarchical Modelling with multiple timepoints for patients as an outcome Questions	2	391	February 11, 2021
Noob question: Bayesian inference in AB test vs frequentist approach (N size) version agnostic doc , modeling	6	504	June 21, 2022

Hierarchical AB Testing Interpretation

Related topics