Simulating AB test outcome with prior data from same test on different platform

6 moths ago i ran a test on platform A that had favorable results on a rate metric (conversion metric). I want to run a simulation that would show possible outcomes on platform B.

Things to know:

  • Platforms A and B have different prices
  • Measured lift on platform A
  • I know that platform B has lower conversion rates in general
  • I suspect that test lifts on platform B will be lower then measured lifts on platform A

Since A and B have different prices I want to calculate possible revenue before running the test to determine if it is worth to run the experiment. My PYMC is not on the level I would like to have it also statistics is not the best.

Here is a NumPy approach:

import numpy as np 
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt

N = 10000
cr_B = 0.22 # conversion rate before test
lift_A = 0.15 # measured lift of the test on platform A

prc_A = 64
prc_B = 45 
 
INSTALLS = 50000

# The lift will can be modeled with a beta distribution 
# I use "small" values for alpha and beta to have more variance in the distribution of possible lifts
dist_lift = np.random.beta(22,78,N)

# keeping only the values that are smaller then the lifts on platform A since platform B has worst performance
dist_lift = dist_lift[dist_lift<=lift_A]

# resampling to get back to the original number of lift points 
dist_lift = np.concatenate((
    dist_lift, 
    np.random.choice(dist_lift,size=N-len(dist_lift))
    ))

# CONVERSION RATE FOR PLATFORM B
# I have tis data and im confident in it so I uses actual numners for the beta dist.
disr_cr_real = np.random.beta(950,4318,N)

# SIMULATION FOR PLATFORM B
# multiply the conversions of B with the distribution of lifts 
dist_cr_simm = disr_cr_real * (dist_lift +1)

# FROM CONVERSION TO USERS TO REVENUE
rev_real = (disr_cr_real * INSTALLS) * prc_B
rev_simm = (dist_cr_simm * INSTALLS) * prc_A


fig, ax = plt.subplots(figsize=(10,5))
sns.kdeplot(rev_real, ax=ax, label='Raal', shade=True)
sns.kdeplot(rev_simm, ax=ax, label='Simm', shade=True)
ax.legend()

Looks nice, problem is I don’t know if i can trust this approach?
The conversion rate of platform B shouldn’t greater then that on platform A.
Is there a better way get a distribution for the lift than with resampling a filtered beta distribution?
How do i code it pymc?

Your help and ideas are much appreciated.

Thanks