Assume that two groups have been randomized to a two arm AB test (i.e. test and control). One group gets a nothing, the other gets an promotional offer.
The outcome is money spent. The problem is that not all customers who get the offer will spend money, so there will be a portion of customers in each group that have $0 spend.
I’d like to model this data. Let’s assume a few things about the data:
-
The probability of participating in the promotion is a function of the group. Promotions are enticing and offer a deal, so people are more likely to spend money if they are offered the promotion.
-
The distribution of spend conditional on group is Gamma with some scale and shape parameter.
-
The mean of the spend distribution is also a function of the group. Let’s say that the promotion is worded in a way to encourage spending more money.
I’ve managed to simulate some data, shown below:
import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm
from scipy.special import expit
import seaborn as sns
p_offer = 0.5
b_offer = np.log(1.2)
group = np.sort(np.tile([0,1],200))
eta_response = 0 + p_offer*group
eta_spend = np.log(10) + b_offer*group
responded = np.random.binomial(1,p = expit(eta_response) )
y = responded*np.random.gamma(shape = 2, scale =np.exp(eta_spend)/2)
sns.distplot(y[group==1],kde = False, label = 'Test')
sns.distplot(y[group==0],kde = False, label = 'Control')
plt.legend()
As can be seen, the probability of participating in the offer is a function of the group, and the mean spend changes as a function of the group.
How can I model this in pymc3? I’ve written some code below:
def likelihood(p,alpha, beta, r, s):
LL_response = pm.Binomial.dist(n = 1, p = p).logp(r).eval().sum()
LL_spend = pm.Gamma.dist(alpha = alpha, beta = beta).logp(s).eval().sum()
return LL_response + LL_spend
with pm.Model() as model:
#Response likelihood
base_response = pm.Normal('b_response', 0, 1)
response_effect = pm.Normal('b_response_effect', 0, 1)
response_logodds = base_response + group*response_effect
ps = pm.math.invlogit(response_logodds)
#Spend likelihood
base_spend = pm.Normal('b_spend', 0, 1)
spend_effect = pm.Normal('b_spend_effect', 0, 1)
spend_lin_pred = base_spend + group*spend_effect
shape = 2
mu = pm.math.exp(spend_lin_pred)
#likelihood
ll = pm.Potential('y', likelihood(ps,shape, mu/shape, responded, y))
trace = pm.sample(2000)
I’ve tried to write the log likelihood for my simulated data in the function likelihood
and then pass that to the Potential
function, as per other questions I have seen on discourse. There is an error given to me when I attempt to run the model:
MissingInputError: Input 0 of the graph (indices start from 0), used to compute InplaceDimShuffle{x}(b_response_effect), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.
So I’m a bit at a loss. Am I on the right track here? If so, what have I done wrong to elicit this error? If not, what is the correct approach?