Binomial AB Testing

Newbie here.

I’m trying to address a limitation in Bayesian AB tests. We’re assuming every observation is independent, but we often observe each customer more than once. I have a model that works on synthetic data, but it is very slow when realistic data volumes are provided (BetaBinomial is a poor fit, so I don’t want to rely upon it w/conjugates / Gibbs).

I think there is redundancy in the example below because the conversion likelihood for each individual customer is sampled in the posterior. But I can’t find a way to draw independent samples from a uniform prior and feed those into the Binomial p parameter. If I give the uniform prior no shape, the model will assume all customers have the same conversion likelihood, p.

import pymc as pm
import pandas as pd
import numpy as np

observations = [30]*20  #attempts, each entry represents a new customer 
occurrences = [2]*10 + [28]*10  #successes for the corresponding customers

with pm.Model() as model:

    p_t = pm.Uniform("test_distn", 0, 1, shape=(len(observations), ))
    p_t_mean = pm.Deterministic("test", p_t.mean())
    obs_test = pm.Binomial("obs_test", n=np.array(observations), p=p_t, observed=occurrences)

    step = pm.Metropolis()
    trace = pm.sample(posterior_sample + burn, chains=2, step=step)

I also end up randomly flattening “test_distn” from the trace to see how the customers success likelihood is distributed. Which works, but feels ugly/hacky

This attempt to remove redundancy doesn’t work. Posterior for “test_distn” is clearly wrong

import pymc as pm
import pandas as pd
import numpy as np

observations = [30]*20  #attempts, each entry represents a new customer 
occurrences = [2]*10 + [28]*10  #successes for the corresponding customers

with pm.Model() as model:

    p_t = pm.Uniform("test_distn", 0, 1)
    p_t_n = pm.draw(p_t, draws = len(observations))
    p_t_mean = pm.Deterministic("test", p_t.mean())
    obs_test = pm.Binomial("obs_test", n=np.array(observations), p=p_t_n, observed=occurrences)

    step = pm.Metropolis()
    trace = pm.sample(posterior_sample + burn, chains=2, step=step)

Any help would be most appreciated. It feels like an efficient solution here is highly applicable, hopefully others could benefit too

1 Like

Welcome!

There are several things here that seem less than ideal. First, you should probably stick to calling pm.sample(), passing no arguments so that it executes using all the default values for all arguments (this is particularly true for the step argument). Because observations is always 30, you can don’t have to pass n=np.array(observations) into the binomial prior, you can just set n=30.

I’m not sure what you mean by “flattening” here. When I run the following model:

n_flips = 30
observations = [n_flips] * 20
occurrences = [2] * 10 + [28] * 10

with pm.Model() as model:
    p_t = pm.Uniform("test_distn", 0, 1, shape=(len(observations),))
    obs_test = pm.Binomial("obs_test", n=n_flips, p=p_t, observed=occurrences)
    idata = pm.sample()

I inspect the parameter you asked about:

idata.posterior["test_distn"].mean(dim=["chain", "draw"])

I get this:

<xarray.DataArray 'test_distn' (test_distn_dim_0: 20)> Size: 160B
array([0.09349925, 0.09319441, 0.09354546, 0.09385241, 0.09396087,
       0.09368252, 0.09379583, 0.09246933, 0.09423234, 0.09350552,
       0.9054838 , 0.90666993, 0.90597412, 0.90627215, 0.90680455,
       0.9063937 , 0.90617682, 0.90684242, 0.90594251, 0.90559609])
Coordinates:
  * test_distn_dim_0  (test_distn_dim_0) int64 160B 0 1 2 3 4 ... 15 16 17 18 19

which seems about right to me.

Happy to answer questions. I might also suggest checking out this notebook for some ideas about how to attack these sorts of models in a slightly more general manner.

Thanks for this. Really great to get advice from the community so quickly. It does run much quicker using default args, NUTS sampler, smaller sample size.

You asked what I meant by flattening.

I had been accessing the posterior samples using idata.posterior[“test_distn”].data
This has shape (chains, posterior_sample_size, n_customers)

i.e. we’re learning the posterior for every customer individually which doesn’t seem necessary to me.
Perhaps this is unavoidable.

To learn the distribution amongst customers I had been randomly selecting one customer from each posterior sample, reducing the shape to (chains, posterior_sample_size)

The extraction you’ve shown appears to do a similar thing:
idata.posterior[“test_distn”].mean(dim=[“chain”, “draw”])

I would encourage you to check out the notebook I linked to as it takes a hierarchical approach. That is, it learned something about each customer, but uses what it learns to simultaneously learn something about all customers (which, in turn, informs what you learn about each customer).

1 Like