Strategies for speeding up power analyses

Thank you both for your responses so far!

Let’s say its a simple A/B test. Example: What’s the lift of some variant B over the original experience A (control)? How much time needs to pass for their to be enough observations to have an 80% probability of detecting a 5% relative effect in our metric of interest?

Would hierarchical models really work if there were only 2 contexts though? (variant A vs. variant B)

I definitely agree with those points you made, and the Bayesian summary statistics are a really helpful step in the right direction

The approach I typically use (which I’ll show below) I assume does a slightly better job than the typical frequentist NHST because the “p-value” is more meaningful than a frequentist p-value. I.e. calculating P(B > A | data, priors) from a bayesian posterior is more meaningful than a frequentist p-value.

Taking a step back, in general I’d say the question “how long/how many trials would it take to find some meaningful difference between control/variant that we’re x% sure of?” is a fair and valuable question in any domain (frequentist or bayesian). I don’t quite understand why a Bayesian approach would decide this isn’t a meaningful question (am I misinterpreting here)? Is there a better way to answer this question in a Bayesian context than the typical power analysis?

Here’s a full example of how I approach a power analysis now (I originally used the format mentioned in the first post, but just came up with this idea which seems to be faster). This is a simple example, typically I’d have more parameters / user traits and might have a regression in there. Note: I avoided aggregating this into counts of the form Binomial(n, p) and instead chose Binomial(1, p)` since typically I’ll have other user level data, and this will more accurately represent the speed I’d see to simulate power. This takes hours if I were to have 1000 simulated datasets which is what I’d like to solve for

import pandas as pd
import numpy as np
import pymc3 as pm
import matplotlib.pyplot as plt


# def generate_data(n, p, p_diff=0.01):
#     A = np.random.binomial(n, p, size=1)
#     B = np.random.binomial(n, p+p_diff, size=1)
#     return A, B

# def create_data(n, p, p_diff=0.01, n_datasets=100, nsim=1):
#     convs = np.squeeze( [generate_data(n, p, p_diff, nsim=nsim)  for i in range(n_datasets)] )
#     return pd.DataFrame({"idx": range(n_datasets), "A":convs[:,0], "B":convs[:,1], "N":n})\
#                 .set_index(["idx", "N"])\
#                 .stack().reset_index()\
#                 .rename(columns={"level_2":"variant", 0:"convert"})
    
    
    
def generate_data_user_level(n, p, p_diff=0.01):
    A = np.random.binomial(1, p, size=n)
    B = np.random.binomial(1, p+p_diff, size=n)
    return A, B

def create_data_user_level(n, p, p_diff=0.01, n_datasets=100):
    
    convs = np.concatenate([generate_data_user_level(n, p, p_diff)  for i in range(n_datasets)],axis=-1).T
    return pd.DataFrame({"idx": np.repeat(range(n_datasets) , n), "A":convs[:,0], "B":convs[:,1], "N":1})\
                .set_index(["idx", "N"])\
                .stack().reset_index()\
                .rename(columns={"level_2":"variant", 0:"convert"})
    
    
# Parameters
n_datasets = 100
n = 1000
a_prior = 15
b_prior = 85
alpha = 0.1 

# Simulating data Data
data = create_data_user_level(n, 0.15, 0.01, n_datasets=n_datasets)

# Gathering data
variant_, variant_labels = pd.factorize(data.variant)
dataset_ = data.idx.values
N_variant = len(variant_labels)
print( data.head() )

# Model
with pm.Model() as m1:
    p = pm.Beta("p", a_prior, b_prior, shape = (n_datasets, N_variant))
    conv = pm.Binomial("conv", data.N.values, p[data.idx.values, variant_], observed=data.convert.values)
    trace = pm.sample(return_inferencedata=True)
    
# Evaluating Power
p_ = trace.posterior["p"].values.reshape(4000, n_datasets, 2)
A = p_[:,:,0]
B = p_[:,:,1]

# Modeling successes/failures with a beta distribution 
# here would might more sense so I'd have a distribution of Power
# but this is a quick example
power = ((B > A).mean(axis=0) > alpha).mean() 
print("Power: {:.3f}".format(power))