Revenue metric for ab testing

hi, i am new to pymc and was able to build a model for conversion rate metric A/B testing
but now, i am having to implement a model for revenue for A/B testing and i really dont know to model it in pymc. i am following this paper https://cdn2.hubspot.net/hubfs/310840/VWO_SmartStats_technical_whitepaper.pdf

i am modelling the bernoulli as binomial and inputs for model are conversions, sessions and total money, but i am stucked

A_money = 5000
A_conversions = 200
A_sessions = 10000
A_theta = 1/(1+A_money)

with pm.Model() as model:
   prior_control = pm.Beta('prior_control', alpha=1, beta=1) #uninformative prior
   prior_money_control = pm.Gamma()

   #  likelihood
   like_control = pm.Binomial('like_control', n=A_sessions, p=prior_control, observed=A_conversions)
   like_money_control = pm.Exponential('like_money_control', )

   # alpha
   #alpha = pm.Exponential('alpha',)
   # beta
   #beta = pm.Exponential('beta',)
   #g = pm.Gamma('g', alpha=alpha, beta=beta, observed=bw)

   trace = pm.sample(draws=10000, tune=1000, discard_tuned_samples=True, progressbar=True)

thank you in advance

1 Like

So this paper makes some very particular choices for prior distributions on the parameters; in particular it uses conjugate priors which can be fit explicitly.

Taking a look at the model, the most useful assumption is that purchase price is independent of purchase probability. This allows the posterior distribution to factor as:

P[\lambda, \theta | n, c, r] = P[\lambda | n, c]P[\theta | r]

The first of these is Beta (as Beta is conjugate to the bernoulli); and the second is Gamma (as gamma is conjugate to the exponential). By using the closed-form posteriors, the paper gets around MCMC entirely; so all the informatic questions (confidence intervals &c) can be obtained by using the regular samplers available in scipy.

If you want to explore arbitrary priors, you can set up a model like:

import pymc3 as pm
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sbn

lambda_true = 0.082
theta_true = 0.431
n_users=10
with pm.Model() as model_true:
    alpha = pm.Bernoulli('alpha', lambda_true, shape=n_users)
    r = pm.Exponential('r', theta_true, shape=n_users)
    user_rev = pm.Deterministic('revenue', alpha * r)
    test_data = pm.sample_prior_predictive(200)
    
plt.plot(test_data['revenue'][:,0])  # revenue from user 1

image

and perform inference like:

obs_purchase = test_data['alpha'][:,0]
obs_cost = test_data['revenue'][:,0]
obs_cost = obs_cost[obs_cost > 0]
with pm.Model() as model_infer:
    lam = pm.Beta('lambda', 1., 1.)
    thet = pm.HalfNormal('theta', 1.)
    alpha = pm.Bernoulli('alpha', lam, observed=obs_purchase, shape=1)
    r = pm.Exponential('r', thet, observed=obs_cost, shape=1)
    alpha_pred = pm.Bernoulli('alpha_p', lam)
    r_pred = pm.Exponential('r_p', thet)
    revenue = pm.Deterministic('predicted_revenue', r_pred * alpha_pred)
    trace = pm.sample(500, chains=3)
    
pm.traceplot(trace);

Note that using pm.Gamma as the prior on theta should give you the same posterior in the paper – though yours will be approximate (from MCMC) as opposed to exact.

1 Like

@chartl , i really appreciate your help! nice explanation. i was trying to reproduce it without using arrays, just having aggregated info like this:

A_money = 5000
A_conversions = 200
A_sessions = 10000

so, thats why i am changing bernoulli to binomial
as this calculator does: https://vidogreg.shinyapps.io/bayes-arpu-test/
based on the same paper

but, it is not working as expected, i dont know what i am doing wrong, i tried lot of things :grimacing:
sorry for my first steps on this

with pm.Model() as model_infer:
    lam = pm.Beta('lambda', 1., 1.)
    #thet = pm.HalfNormal('theta', 1.)
    thet = pm.Gamma('theta', 1., 1.) #using gamma as you suggested

    #alpha = pm.Bernoulli('alpha', lam, observed=obs_purchase, shape=1)
    alpha = pm.Binomial('alpha', n=A_sessions, p=lam, observed=A_conversions)

    #r = pm.Exponential('r', thet, observed=obs_cost, shape=1)
    r = pm.Exponential('r', thet, observed=A_theta, shape=A_conversions)

    alpha_pred = pm.Bernoulli('alpha_p', lam)
    #alpha_pred = pm.Binomial('alpha_p', n=A_conversions, p=lam)

    r_pred = pm.Exponential('r_p', thet)

    revenue = pm.Deterministic('predicted_revenue', r_pred * alpha_pred)

    trace2 = pm.sample(10000, chains=4)

This is all correct except for the shape in r:

A_money = 5000
A_conversions = 200
A_sessions = 10000

TH=0.40
A_theta = np.random.exponential(1./TH, size=(A_conversions,))

with pm.Model() as model_infer:
    lam = pm.Beta('lambda', 1., 1.)
    thet = pm.Gamma('theta', 1., 1.)
    alpha = pm.Binomial('alpha', n=A_sessions, p=lam, observed=A_conversions)
    r = pm.Exponential('r', thet, observed=A_theta)
    alpha_pred = pm.Bernoulli('alpha_p', lam)
    r_pred = pm.Exponential('r_p', thet)
    revenue = pm.Deterministic('predicted_revenue', r_pred * alpha_pred)
    trace2 = pm.sample(500, chains=4)

pm.traceplot(trace2);

thank you again @chartl

this

is it arbitrary? does it come from A_conversions / A_money? anyway it gives me 0.04

thank you!

Don’t worry about it – I just use it to generate A_theta since I didn’t have any data. The parameter is just the exponential parameter: and where pymc3 uses theta, numpy uses 1/theta.

thank you @chartl
but for example, can i get A_theta from A_money?
because that exponential distribution should come from spent money
you wrote
A_theta = np.random.exponential(1./A_theta, size=(A_conversions,))
i also tried
A_theta = np.random.exponential(1./A_money, size=(A_conversions,))

but it didnt work as expected
thank you for your patience and time

So A_money is the total money spent over all conversions right? The problem here is that you aren’t actually observing how much money is spent on each transaction; but only the total money.

It turns out that a sum of k exponential distributions is in fact a gamma distribution \Gamma(k, \theta)

so to do this

A_money = 5000
A_conversions = 200
A_sessions = 10000

with pm.Model() as model_infer:
    lam = pm.Beta('lambda', 1., 1.)
    thet = pm.Gamma('theta', 1., 1.)

    alpha = pm.Binomial('alpha', n=A_sessions, p=lam, observed=A_conversions)

    r = pm.Gamma('r', A_conversions, thet, observed=A_money)

    alpha_pred = pm.Bernoulli('alpha_p', lam)

    r_pred = pm.Exponential('r_p', thet)

    revenue = pm.Deterministic('predicted_revenue', r_pred * alpha_pred)

    trace2 = pm.sample(500, chains=4)

pm.traceplot(trace2);

hi @chartl,
“The problem here is that you aren’t actually observing how much money is spent on each transaction”
which would be this problem? it is that i loss precision in the model?

You lose some modeling flexibility. On the one hand you can choose to model the distribution of spending directly

r_i \sim \mathrm{Exp}(\theta) or r_i \sim \mathrm{Pareto}(\alpha) -1

which requires the vector of money spent. This lets you use any distribution to infer the parameter and predict future expenditure.

Alternatively, you can choose to model the total spending

R = \sum_{i=1}^k r_i \sim \Gamma(\theta, k)

to infer the parameters, but now if you want to predict future expenditure (the distribution of r_i), you need some link between the parameters of P(R|\theta) and the parameters of P(r_i|\theta). This limits the distributions to those with closed-form sums, such as the exponential (sum of i.i.d. exponentials is gamma), normal (sum of normals is normal), or gamma (sum of gammas is gamma); but not things like lognormal or power.

perfect @chartl
understood
but, taking account scalibility, i am modelling a big ecommerce site, money spent vector could have 15 millons items.
i tried pymc3 with such a big numbers and it is very slow, or maybe i am doing something wrong :grimacing: