Modelling a mixture of binomial and gamma


#1

Assume that two groups have been randomized to a two arm AB test (i.e. test and control). One group gets a nothing, the other gets an promotional offer.

The outcome is money spent. The problem is that not all customers who get the offer will spend money, so there will be a portion of customers in each group that have $0 spend.

I’d like to model this data. Let’s assume a few things about the data:

  • The probability of participating in the promotion is a function of the group. Promotions are enticing and offer a deal, so people are more likely to spend money if they are offered the promotion.

  • The distribution of spend conditional on group is Gamma with some scale and shape parameter.

  • The mean of the spend distribution is also a function of the group. Let’s say that the promotion is worded in a way to encourage spending more money.

I’ve managed to simulate some data, shown below:

import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm
from scipy.special import expit
import seaborn as sns

p_offer = 0.5
b_offer = np.log(1.2)
group = np.sort(np.tile([0,1],200))

eta_response = 0 + p_offer*group
eta_spend = np.log(10) + b_offer*group

responded = np.random.binomial(1,p = expit(eta_response) )
y = responded*np.random.gamma(shape = 2, scale =np.exp(eta_spend)/2)


sns.distplot(y[group==1],kde = False, label = 'Test')
sns.distplot(y[group==0],kde = False, label = 'Control')
plt.legend()

As can be seen, the probability of participating in the offer is a function of the group, and the mean spend changes as a function of the group.

How can I model this in pymc3? I’ve written some code below:

def likelihood(p,alpha, beta, r, s):

        LL_response = pm.Binomial.dist(n = 1, p = p).logp(r).eval().sum()

        LL_spend = pm.Gamma.dist(alpha = alpha, beta = beta).logp(s).eval().sum()

        return LL_response + LL_spend

with pm.Model() as model:
    
    #Response likelihood
    
    base_response = pm.Normal('b_response', 0, 1)
    response_effect = pm.Normal('b_response_effect', 0, 1)
    response_logodds = base_response + group*response_effect 
    
    ps = pm.math.invlogit(response_logodds)
    
    #Spend likelihood
    
    base_spend = pm.Normal('b_spend', 0, 1)
    spend_effect = pm.Normal('b_spend_effect', 0, 1)
    spend_lin_pred = base_spend + group*spend_effect
    
    shape = 2
    
    mu = pm.math.exp(spend_lin_pred)
    
    #likelihood
    
    ll = pm.Potential('y', likelihood(ps,shape, mu/shape, responded, y))
    
    
    trace = pm.sample(2000)

I’ve tried to write the log likelihood for my simulated data in the function likelihood and then pass that to the Potential function, as per other questions I have seen on discourse. There is an error given to me when I attempt to run the model:

MissingInputError: Input 0 of the graph (indices start from 0), used to compute InplaceDimShuffle{x}(b_response_effect), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.

So I’m a bit at a loss. Am I on the right track here? If so, what have I done wrong to elicit this error? If not, what is the correct approach?


#2

IIUC, the label for each group is fully observed (as it is a randomized experiment), so you dont need to use a mixture model. I would model two set of parameters and index the parameters according to the group.


#3

Isn’t this a case for a zero-inflated gamma model: https://discourse.mc-stan.org/t/zero-inflated-gamma-model/6788

I would love to know how to fit this in Pymc3 but can not help there.