Creating custom Joint Distribution in pymc3

Hi all,

I’m using pymc3 to do some Bayesian A/B testing. The beauty of this approach – in contrast to traditional frequentist approaches – is that there are a variety of metrics you can generate and learn a hell of a lot more about our experiment.

I’m most interested in computing the error metrics and loss metrics as outlined in 6.1 and 6.2 of https://vwo.com/downloads/VWO_SmartStats_technical_whitepaper.pdf

This requires computing the joint distribution of the sampled posteriors, posterior(groupA) * posterior(groupB). I’ve got these sampled postierors, but I’m unsure how to change them into a joint dist in pymc3.

Any help would be useful!

Welcome!

How did you sample the posteriors you have?

Hi,

Here’s the code. I’m currently working with simulated data from stats.binomial.rvs()

alpha_a = 1
beta_a = 1
alpha_b = 1
beta_b = 1
n = 10000

variants = ["Control", "Optimisation"]

with pm.Model() as ab_model:
    
    theta_a = pm.Beta(variants[0], alpha = alpha_a, beta = beta_a)
    theta_b = pm.Beta(variants[1], alpha = alpha_b, beta = beta_b)
    
    data_a = pm.Binomial("A Obs", n = n, p = theta_a, observed = data["Control"])
    data_b = pm.Binomial("B Obs", n = n, p = theta_b, observed = data["Optimisation"])
    
    step = pm.NUTS()
    trace = pm.sample(10000, step = step, return_inferencedata=True)
1 Like

The probability that \lambda_B > \lambda_A (section 6.1) can be calculated (approximately) directly:

pBgtA = (trace.posterior['Control'] > trace.posterior['Optimisation']).mean()

The expected loss with would be something like this:

upliftLossA = (trace.posterior['Control'] - trace.posterior['Optimisation']).clip(min=0).mean()

I may not have nailed the details. but hopefully that gives you some idea of the direction to go.

Two other things:

  1. You don’t need to explicitly create the step. pm.sample() will automatically infer what step method is necessary given your model.
  2. When asking for an inferenceData object, it’s conventional to call the return value idata (so that you remember that there is more than just the MCMC trace stored inside). Stylistic choice, but something you might want to be aware of.
1 Like

Thanks!

1 Like