6 moths ago i ran a test on platform A that had favorable results on a rate metric (conversion metric). I want to run a simulation that would show possible outcomes on platform B.
Things to know:
- Platforms A and B have different prices
- Measured lift on platform A
- I know that platform B has lower conversion rates in general
- I suspect that test lifts on platform B will be lower then measured lifts on platform A
Since A and B have different prices I want to calculate possible revenue before running the test to determine if it is worth to run the experiment. My PYMC is not on the level I would like to have it also statistics is not the best.
Here is a NumPy approach:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
N = 10000
cr_B = 0.22 # conversion rate before test
lift_A = 0.15 # measured lift of the test on platform A
prc_A = 64
prc_B = 45
INSTALLS = 50000
# The lift will can be modeled with a beta distribution
# I use "small" values for alpha and beta to have more variance in the distribution of possible lifts
dist_lift = np.random.beta(22,78,N)
# keeping only the values that are smaller then the lifts on platform A since platform B has worst performance
dist_lift = dist_lift[dist_lift<=lift_A]
# resampling to get back to the original number of lift points
dist_lift = np.concatenate((
dist_lift,
np.random.choice(dist_lift,size=N-len(dist_lift))
))
# CONVERSION RATE FOR PLATFORM B
# I have tis data and im confident in it so I uses actual numners for the beta dist.
disr_cr_real = np.random.beta(950,4318,N)
# SIMULATION FOR PLATFORM B
# multiply the conversions of B with the distribution of lifts
dist_cr_simm = disr_cr_real * (dist_lift +1)
# FROM CONVERSION TO USERS TO REVENUE
rev_real = (disr_cr_real * INSTALLS) * prc_B
rev_simm = (dist_cr_simm * INSTALLS) * prc_A
fig, ax = plt.subplots(figsize=(10,5))
sns.kdeplot(rev_real, ax=ax, label='Raal', shade=True)
sns.kdeplot(rev_simm, ax=ax, label='Simm', shade=True)
ax.legend()
Looks nice, problem is I don’t know if i can trust this approach?
The conversion rate of platform B shouldn’t greater then that on platform A.
Is there a better way get a distribution for the lift than with resampling a filtered beta distribution?
How do i code it pymc?
Your help and ideas are much appreciated.
Thanks