Creating custom Joint Distribution in pymc3

jules · May 10, 2022, 1:46pm

Hi all,

I’m using pymc3 to do some Bayesian A/B testing. The beauty of this approach – in contrast to traditional frequentist approaches – is that there are a variety of metrics you can generate and learn a hell of a lot more about our experiment.

I’m most interested in computing the error metrics and loss metrics as outlined in 6.1 and 6.2 of https://vwo.com/downloads/VWO_SmartStats_technical_whitepaper.pdf

This requires computing the joint distribution of the sampled posteriors, posterior(groupA) * posterior(groupB). I’ve got these sampled postierors, but I’m unsure how to change them into a joint dist in pymc3.

Any help would be useful!

cluhmann · May 10, 2022, 6:02pm

Welcome!

How did you sample the posteriors you have?

jules · May 11, 2022, 9:43am

Hi,

Here’s the code. I’m currently working with simulated data from stats.binomial.rvs()

alpha_a = 1
beta_a = 1
alpha_b = 1
beta_b = 1
n = 10000

variants = ["Control", "Optimisation"]

with pm.Model() as ab_model:
    
    theta_a = pm.Beta(variants[0], alpha = alpha_a, beta = beta_a)
    theta_b = pm.Beta(variants[1], alpha = alpha_b, beta = beta_b)
    
    data_a = pm.Binomial("A Obs", n = n, p = theta_a, observed = data["Control"])
    data_b = pm.Binomial("B Obs", n = n, p = theta_b, observed = data["Optimisation"])
    
    step = pm.NUTS()
    trace = pm.sample(10000, step = step, return_inferencedata=True)

cluhmann · May 11, 2022, 5:27pm

The probability that \lambda_B > \lambda_A (section 6.1) can be calculated (approximately) directly:

pBgtA = (trace.posterior['Control'] > trace.posterior['Optimisation']).mean()

The expected loss with would be something like this:

upliftLossA = (trace.posterior['Control'] - trace.posterior['Optimisation']).clip(min=0).mean()

I may not have nailed the details. but hopefully that gives you some idea of the direction to go.

Two other things:

You don’t need to explicitly create the step. pm.sample() will automatically infer what step method is necessary given your model.
When asking for an inferenceData object, it’s conventional to call the return value idata (so that you remember that there is more than just the MCMC trace stored inside). Stylistic choice, but something you might want to be aware of.

jules · May 12, 2022, 10:31am

Thanks!

Topic		Replies	Views
Unexpected results Questions	2	495	February 15, 2019
Create Custom Joint Distribution Questions	9	2584	May 26, 2019
How to set up a custom likelihood function for two variables Questions	8	13260	March 9, 2018
Fitting joint distributions using custom distribution as part of mixture Questions	3	486	January 13, 2020
Samples from Marginal Posterior Distribution Questions	2	869	March 6, 2022

Creating custom Joint Distribution in pymc3

Related topics