Assume that two groups have been randomized to a two arm AB test (i.e. test and control). One group gets a nothing, the other gets an promotional offer.
The outcome is money spent. The problem is that not all customers who get the offer will spend money, so there will be a portion of customers in each group that have $0 spend.
I’d like to model this data. Let’s assume a few things about the data:

The probability of participating in the promotion is a function of the group. Promotions are enticing and offer a deal, so people are more likely to spend money if they are offered the promotion.

The distribution of spend conditional on group is Gamma with some scale and shape parameter.

The mean of the spend distribution is also a function of the group. Let’s say that the promotion is worded in a way to encourage spending more money.
I’ve managed to simulate some data, shown below:
import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm
from scipy.special import expit
import seaborn as sns
p_offer = 0.5
b_offer = np.log(1.2)
group = np.sort(np.tile([0,1],200))
eta_response = 0 + p_offer*group
eta_spend = np.log(10) + b_offer*group
responded = np.random.binomial(1,p = expit(eta_response) )
y = responded*np.random.gamma(shape = 2, scale =np.exp(eta_spend)/2)
sns.distplot(y[group==1],kde = False, label = 'Test')
sns.distplot(y[group==0],kde = False, label = 'Control')
plt.legend()
As can be seen, the probability of participating in the offer is a function of the group, and the mean spend changes as a function of the group.
How can I model this in pymc3? I’ve written some code below:
def likelihood(p,alpha, beta, r, s):
LL_response = pm.Binomial.dist(n = 1, p = p).logp(r).eval().sum()
LL_spend = pm.Gamma.dist(alpha = alpha, beta = beta).logp(s).eval().sum()
return LL_response + LL_spend
with pm.Model() as model:
#Response likelihood
base_response = pm.Normal('b_response', 0, 1)
response_effect = pm.Normal('b_response_effect', 0, 1)
response_logodds = base_response + group*response_effect
ps = pm.math.invlogit(response_logodds)
#Spend likelihood
base_spend = pm.Normal('b_spend', 0, 1)
spend_effect = pm.Normal('b_spend_effect', 0, 1)
spend_lin_pred = base_spend + group*spend_effect
shape = 2
mu = pm.math.exp(spend_lin_pred)
#likelihood
ll = pm.Potential('y', likelihood(ps,shape, mu/shape, responded, y))
trace = pm.sample(2000)
I’ve tried to write the log likelihood for my simulated data in the function likelihood
and then pass that to the Potential
function, as per other questions I have seen on discourse. There is an error given to me when I attempt to run the model:
MissingInputError: Input 0 of the graph (indices start from 0), used to compute InplaceDimShuffle{x}(b_response_effect), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.
So I’m a bit at a loss. Am I on the right track here? If so, what have I done wrong to elicit this error? If not, what is the correct approach?