Revenue metric for ab testing

hi, i am new to pymc and was able to build a model for conversion rate metric A/B testing
but now, i am having to implement a model for revenue for A/B testing and i really dont know to model it in pymc. i am following this paper https://cdn2.hubspot.net/hubfs/310840/VWO_SmartStats_technical_whitepaper.pdf

i am modelling the bernoulli as binomial and inputs for model are conversions, sessions and total money, but i am stucked

A_money = 5000
A_conversions = 200
A_sessions = 10000
A_theta = 1/(1+A_money)

with pm.Model() as model:
   prior_control = pm.Beta('prior_control', alpha=1, beta=1) #uninformative prior
   prior_money_control = pm.Gamma()

   #  likelihood
   like_control = pm.Binomial('like_control', n=A_sessions, p=prior_control, observed=A_conversions)
   like_money_control = pm.Exponential('like_money_control', )

   # alpha
   #alpha = pm.Exponential('alpha',)
   # beta
   #beta = pm.Exponential('beta',)
   #g = pm.Gamma('g', alpha=alpha, beta=beta, observed=bw)

   trace = pm.sample(draws=10000, tune=1000, discard_tuned_samples=True, progressbar=True)

thank you in advance

2 Likes

So this paper makes some very particular choices for prior distributions on the parameters; in particular it uses conjugate priors which can be fit explicitly.

Taking a look at the model, the most useful assumption is that purchase price is independent of purchase probability. This allows the posterior distribution to factor as:

P[\lambda, \theta | n, c, r] = P[\lambda | n, c]P[\theta | r]

The first of these is Beta (as Beta is conjugate to the bernoulli); and the second is Gamma (as gamma is conjugate to the exponential). By using the closed-form posteriors, the paper gets around MCMC entirely; so all the informatic questions (confidence intervals &c) can be obtained by using the regular samplers available in scipy.

If you want to explore arbitrary priors, you can set up a model like:

import pymc3 as pm
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sbn

lambda_true = 0.082
theta_true = 0.431
n_users=10
with pm.Model() as model_true:
    alpha = pm.Bernoulli('alpha', lambda_true, shape=n_users)
    r = pm.Exponential('r', theta_true, shape=n_users)
    user_rev = pm.Deterministic('revenue', alpha * r)
    test_data = pm.sample_prior_predictive(200)
    
plt.plot(test_data['revenue'][:,0])  # revenue from user 1

image

and perform inference like:

obs_purchase = test_data['alpha'][:,0]
obs_cost = test_data['revenue'][:,0]
obs_cost = obs_cost[obs_cost > 0]
with pm.Model() as model_infer:
    lam = pm.Beta('lambda', 1., 1.)
    thet = pm.HalfNormal('theta', 1.)
    alpha = pm.Bernoulli('alpha', lam, observed=obs_purchase, shape=1)
    r = pm.Exponential('r', thet, observed=obs_cost, shape=1)
    alpha_pred = pm.Bernoulli('alpha_p', lam)
    r_pred = pm.Exponential('r_p', thet)
    revenue = pm.Deterministic('predicted_revenue', r_pred * alpha_pred)
    trace = pm.sample(500, chains=3)
    
pm.traceplot(trace);

Note that using pm.Gamma as the prior on theta should give you the same posterior in the paper – though yours will be approximate (from MCMC) as opposed to exact.

3 Likes

@chartl , i really appreciate your help! nice explanation. i was trying to reproduce it without using arrays, just having aggregated info like this:

A_money = 5000
A_conversions = 200
A_sessions = 10000

so, thats why i am changing bernoulli to binomial
as this calculator does: https://vidogreg.shinyapps.io/bayes-arpu-test/
based on the same paper

but, it is not working as expected, i dont know what i am doing wrong, i tried lot of things :grimacing:
sorry for my first steps on this

with pm.Model() as model_infer:
    lam = pm.Beta('lambda', 1., 1.)
    #thet = pm.HalfNormal('theta', 1.)
    thet = pm.Gamma('theta', 1., 1.) #using gamma as you suggested

    #alpha = pm.Bernoulli('alpha', lam, observed=obs_purchase, shape=1)
    alpha = pm.Binomial('alpha', n=A_sessions, p=lam, observed=A_conversions)

    #r = pm.Exponential('r', thet, observed=obs_cost, shape=1)
    r = pm.Exponential('r', thet, observed=A_theta, shape=A_conversions)

    alpha_pred = pm.Bernoulli('alpha_p', lam)
    #alpha_pred = pm.Binomial('alpha_p', n=A_conversions, p=lam)

    r_pred = pm.Exponential('r_p', thet)

    revenue = pm.Deterministic('predicted_revenue', r_pred * alpha_pred)

    trace2 = pm.sample(10000, chains=4)

This is all correct except for the shape in r:

A_money = 5000
A_conversions = 200
A_sessions = 10000

TH=0.40
A_theta = np.random.exponential(1./TH, size=(A_conversions,))

with pm.Model() as model_infer:
    lam = pm.Beta('lambda', 1., 1.)
    thet = pm.Gamma('theta', 1., 1.)
    alpha = pm.Binomial('alpha', n=A_sessions, p=lam, observed=A_conversions)
    r = pm.Exponential('r', thet, observed=A_theta)
    alpha_pred = pm.Bernoulli('alpha_p', lam)
    r_pred = pm.Exponential('r_p', thet)
    revenue = pm.Deterministic('predicted_revenue', r_pred * alpha_pred)
    trace2 = pm.sample(500, chains=4)

pm.traceplot(trace2);

thank you again @chartl

this

is it arbitrary? does it come from A_conversions / A_money? anyway it gives me 0.04

thank you!

Don’t worry about it – I just use it to generate A_theta since I didn’t have any data. The parameter is just the exponential parameter: and where pymc3 uses theta, numpy uses 1/theta.

thank you @chartl
but for example, can i get A_theta from A_money?
because that exponential distribution should come from spent money
you wrote
A_theta = np.random.exponential(1./A_theta, size=(A_conversions,))
i also tried
A_theta = np.random.exponential(1./A_money, size=(A_conversions,))

but it didnt work as expected
thank you for your patience and time

So A_money is the total money spent over all conversions right? The problem here is that you aren’t actually observing how much money is spent on each transaction; but only the total money.

It turns out that a sum of k exponential distributions is in fact a gamma distribution \Gamma(k, \theta)

so to do this

A_money = 5000
A_conversions = 200
A_sessions = 10000

with pm.Model() as model_infer:
    lam = pm.Beta('lambda', 1., 1.)
    thet = pm.Gamma('theta', 1., 1.)

    alpha = pm.Binomial('alpha', n=A_sessions, p=lam, observed=A_conversions)

    r = pm.Gamma('r', A_conversions, thet, observed=A_money)

    alpha_pred = pm.Bernoulli('alpha_p', lam)

    r_pred = pm.Exponential('r_p', thet)

    revenue = pm.Deterministic('predicted_revenue', r_pred * alpha_pred)

    trace2 = pm.sample(500, chains=4)

pm.traceplot(trace2);

hi @chartl,
“The problem here is that you aren’t actually observing how much money is spent on each transaction”
which would be this problem? it is that i loss precision in the model?

You lose some modeling flexibility. On the one hand you can choose to model the distribution of spending directly

r_i \sim \mathrm{Exp}(\theta) or r_i \sim \mathrm{Pareto}(\alpha) -1

which requires the vector of money spent. This lets you use any distribution to infer the parameter and predict future expenditure.

Alternatively, you can choose to model the total spending

R = \sum_{i=1}^k r_i \sim \Gamma(\theta, k)

to infer the parameters, but now if you want to predict future expenditure (the distribution of r_i), you need some link between the parameters of P(R|\theta) and the parameters of P(r_i|\theta). This limits the distributions to those with closed-form sums, such as the exponential (sum of i.i.d. exponentials is gamma), normal (sum of normals is normal), or gamma (sum of gammas is gamma); but not things like lognormal or power.

perfect @chartl
understood
but, taking account scalibility, i am modelling a big ecommerce site, money spent vector could have 15 millons items.
i tried pymc3 with such a big numbers and it is very slow, or maybe i am doing something wrong :grimacing:

Hey Raul,

I guess you never have to use the 15 million items. The idea is that you are doing an AB-test on a certain topic on the website right? You only collect items from both groups until you are certain enough that one of both is better than the other. 15 million items would be far more than you would need for such a test.

When comparing the above described approach with the pymc guide on AB testing the revenue is modeled a bit differently in the sense that in the guide the theta from the Beta distribution is used while here we use the theta to model the conversion via Binomial and Bernoulli which also leads to different results for revenue_per_visitor (mean and especially sd).

Above:

alpha = pm.Binomial('alpha', n=A_sessions, p=lam, observed=A_conversions)
r = pm.Gamma('r', A_conversions, thet, observed=A_money)
alpha_pred = pm.Bernoulli('alpha_p', lam)
r_pred = pm.Exponential('r_p', thet)
revenue_per_visitor  = pm.Deterministic('predicted_revenue', r_pred * alpha_pred)

vs.

Guide:

revenue_per_visitor = pm.Deterministic("revenue_per_visitor", thet * (1 / lam))

What is the reason for these differences? Is one of them wrong?