Hierarchical AB Testing Interpretation

cluhmann · January 29, 2021, 2:54am

As @nkaimcaudle describes, you are likely using an older version of pymc3. Unless you are running an even older version, you are likely to be seeing warnings about pm.sample() returning an object of type inferencedata in future versions.

Regarding the interpretation of your original model, you seem to want to say something about group differences and are struggling because the thetas live at the study level (which is why you have 2 per group). So it sounds like what you really want to do is interpret the alphas and the betas describing the beta distribution from which the thetas are generated rather than the thetas themselves. This is often the case when using hierarchical models. The lower levels exist because we expect there to be random variation across “instances” (studies here, but it could be anything), but often want to abstract away from the specific instances we observed in our data when we try to draw general/generalizable inferences.

One way to do this would be to determine the expected value of theta for each credible value of alpha and beta (E[\theta]=\frac{\alpha}{\alpha + \beta}). Something like this:

ExpectTheta_A = pm.Deterministic(
                    'ExpectTheta_A',
                    (ab_A[0] / (ab_A[0] + ab_A[1]))
                )
ExpectTheta_B = pm.Deterministic(
                    'ExpectTheta_B',
                    (ab_B[0] / (ab_B[0] + ab_B[1]))
                )

Then you could take the difference between these values and interpret them as your group differences.

But I don’t necessary like this solution because it hides/ignores the uncertainty we have about the distribution of thetas we expect from individual studies within a single group (though this may exactly what you care about). If you are instead looking to say something about the expected differences within a pair of studies, each involving a separate group, then I might suggest something like what is below. It requires you to get your hands a bit dirtier, but allows you to really dig in and extract whatever insights you want.

import scipy.stats

# how many posterior draws to pull
n_ppc_draws = 100
ppcs_idx = np.random.choice(range(len(trace2)), 
                            size=n_ppc_draws)

# retrieve credible alpha and beta values for group A
alpha_A = trace2['ab_A'][ppcs_idx,0]
beta_A = trace2['ab_A'][ppcs_idx,1]
# generate credible theta values expected from
# *arbitrary studies* run with this group
# NOT just the 2 we happened to observe
# but the "universe" of studies we could possibly run
theta_A = scipy.stats.beta(alpha_A, beta_A).rvs()

Note that we could do this step of sampling multiple times per alpha_A/beta_A values, because each new (arbitrary) study is expected to yield new thetas, even for the same group. Alternatively, we could just take the group’s expected value of theta, like we did above:
ExpectTheta_A = alpha_A / (alpha_A + beta_A)

But be careful, no single study is expected to reflect this value of theta! This is just what we expect on average from studies run in/on this group. Individual studies will vary. How much? Well, the variance would be \frac{\alpha \beta}{(\alpha+\beta)^2(\alpha+\beta +1)} or:
(alpha_A * beta_A) / ( np.square(alpha_A + beta_A) * (alpha_A + beta_A + 1) )

Regardless, we can then do the same for group B.

alpha_B = trace2['ab_B'][ppcs_idx,0]
beta_B = trace2['ab_B'][ppcs_idx,1]
theta_B = scipy.stats.beta(alpha_B, beta_B).rvs()

So now what? Presumably we want to know something about the group-wise differences in the credible thetas.

# average difference in thetas from the 2 groups
np.mean(theta_A - theta_B)

# stdev of difference in thetas from the 2 groups
np.std(theta_A - theta_B)

Finally, I am getting loads of divergences when I run this model. I suspect it is related to the use of the half flat prior on the alphas and betas. It seems unlikely that you are really this uncertain and could probably do with a tighter prior.

Topic		Replies	Views
Hierarchical AB Testing Posterior Delta Questions	2	572	August 4, 2020
Hierarchical AB testing in PyMC3 Questions	4	2510	March 2, 2021
Best Practice AB Test Difference in Means Questions	2	607	February 14, 2019
Bayesian Hierarchical Modelling with multiple timepoints for patients as an outcome Questions	2	403	February 11, 2021
Noob question: Bayesian inference in AB test vs frequentist approach (N size) version agnostic doc , modeling	6	510	June 21, 2022

Hierarchical AB Testing Interpretation

Related topics