Implementing the Gamma-Gamma for customer spend in pymc3: first steps

Hello everyone.

I am writing since I am currently trying to implement in pymc3 the Gamma-Gamma model for Customer spend, as described by Fader&Hardie in this paper.

This model was already implemented in the Lifetimes package.

Do you have any suggestion about how to approach this or how to convert the numpy-scipy implementation in pymc3 terms?

More specifically, I am trying to understand how to start from the assumption that the average customer spend is distributed as Gamma(px, vx), where x is the customer’s number of purchases, v is distributed as Gamma(q, gamma), and p, q and gamma are parameters. Once I sort out how to state the distributions for these assumptions, I can try to implement a custom likelihood function.

Any help is appreciated.

1 Like

Here is my first failed attempt:

import lifetimes
import pandas as pd
from pymc3.math import log
from pymc3.distributions.special import gammaln

cdnow_transactions = lifetimes.load_dataset(
    'CDNOW_sample.txt', 
    header=None, 
    delim_whitespace=True, 
    names=['customer_id', 'customer_index', 'date', 'quantity', 'amount'],
    converters={'date': lambda x: pd.to_datetime(x, format="%Y%m%d")}
)

rfm = lifetimes.utils.summary_data_from_transaction_data(
    cdnow_transactions,
    'customer_id',
    'date',
    monetary_value_col='amount',
    observation_period_end=pd.to_datetime('1997-09-30'),
    freq='W'
)

class GammaGamma(pm.Continuous):
    """
    Custom distribution class for Gamma Gamma likelihood.
    """
    
    def __init__(self, p, q, gamma, *args, **kwargs):
        super(GammaGamma, self).__init__(*args, **kwargs)
        self.p = p
        self.q = q
        self.gamma = gamma
        
    def logp(self, x, m_x):
        """
        Source 
        https://github.com/CamDavidsonPilon/lifetimes/blob/master/lifetimes/fitters/gamma_gamma_fitter.py
        and
        https://github.com/bachmannpatrick/CLVTools/blob/master/src/gg_LL.cpp
        """
        negative_log_likelihood_values = (
            gammaln(self.p * x + self.q)
            - gammaln(self.p * x)
            - gammaln(self.q)
            + self.q * log(self.gamma)
            + (self.p * x - 1) * log(m_x)
            + (self.p * x) * log(x)
            - (self.p * x + self.q) * log(x * m_x + self.gamma)
        )

        return -negative_log_likelihood_values.sum()

# Extract data for model following notation from Fader/Hardie
N = rfm.shape[0] # number of customers
x = rfm['frequency'].values
m_x = rfm['monetary_value'].values

n_draws = 4000


with pm.Model() as gg_model:
    
    # Uninformative priors on model hyperparameters see Polson and Scott 
    # https://projecteuclid.org/download/pdfview_1/euclid.ba/1354024466
    p = pm.HalfCauchy('p', beta=2)
    q = pm.HalfCauchy('q', beta=2)
    gamma = pm.HalfCauchy('gamma', beta=2)

    v = pm.Gamma('v', alpha=q, beta=gamma, shape=N, testval=np.random.rand(N))
    z_bar = pm.Gamma('z_bar', alpha=p * x, beta=v * x, shape=N, testval=np.random.rand(N))
    
    # Custom distribution for GammaGamma likelihood function
    loglikelihood = GammaGamma("loglikelihood", p=p, q=q, gamma=gamma, observed={'x': x, 'm_x': m_x})
    
    trace = pm.sample(n_draws, init=None, cores=1, chains=4, pickle_backend='dill') 

which fails with

Initial evaluation of model at starting point failed!

I am very new to statistical programming, but please feel free to add any feedback.

Hi, I made a collab notebook explaining why your code failed. You can find it here. The issue seems to be with your data, not the model. Some of your data observations have 0 in the frequency and monetary value columns, which causes issues with the alpha and beta parameters of your Gamma priors which must be greater than 0. This also causes an issue later in the likelihood function, where you have some logarithms that are blowing up since they’re receiving 0’s. Specifically:

# Your likelihood

def logp(self, x, m_x):

        negative_log_likelihood_values = (

            gammaln(self.p * x + self.q)

            - gammaln(self.p * x)

            - gammaln(self.q)

            + self.q * log(self.gamma)

            + (self.p * x - 1) * log(m_x) # Some of the values in your m_x array are 0, so this term is infinite which you can't have

            + (self.p * x) * log(x) # Some of the values in your x array are 0, so this term is infinite which you can't have

            - (self.p * x + self.q) * log(x * m_x + self.gamma) # Same issue as the previous two lines

        )

        return -negative_log_likelihood_values.sum()

This is the technical reason why your model isn’t working, I’m not familiar with this dataset/model though so I don’t know how you should go about fixing your data. I naively dropped all rows with 0’s in your dataset and got the model to run correctly.

Hope this helps!

Thank you very much, that makes sense and is the same fix they apply in the R-package I linked above.

In principle, the reason why we see some observations with frequency x = 0 – therefore also monetary value m_x = 0 – is that, when computing these quantities, the first transaction is discarded. In other words, by enforcing x and m_x not zero, we will not train the model over any customer with a single transaction.

Thanks @jlindbloom !

The new code is:

import pymc3 as pm
import numpy as np
import pandas as pd
from lifetimes.datasets import load_dataset
from lifetimes.utils import summary_data_from_transaction_data
from pymc3.math import log
from pymc3.distributions.special import gammaln

cdnow_transactions = load_dataset(
    'CDNOW_sample.txt', 
    header=None, 
    delim_whitespace=True, 
    names=['customer_id', 'customer_index', 'date', 'quantity', 'amount'],
    converters={'date': lambda x: pd.to_datetime(x, format="%Y%m%d")}
)

rfm = summary_data_from_transaction_data(
    cdnow_transactions,
    'customer_id',
    'date',
    monetary_value_col='amount',
    observation_period_end=pd.to_datetime('1997-09-30'),
    freq='W'
)

class GammaGamma(pm.Continuous):
    """
    Custom distribution class for Gamma Gamma likelihood.
    """
    
    def __init__(self, p, q, gamma, *args, **kwargs):
        super(GammaGamma, self).__init__(*args, **kwargs)
        self.p = p
        self.q = q
        self.gamma = gamma
        
    def logp(self, x, m_x):
        """
        Source 
        https://github.com/CamDavidsonPilon/lifetimes/blob/master/lifetimes/fitters/gamma_gamma_fitter.py
        and
        https://github.com/bachmannpatrick/CLVTools/blob/master/src/gg_LL.cpp
        """
        negative_log_likelihood_values = (
            gammaln(self.p * x + self.q)
            - gammaln(self.p * x)
            - gammaln(self.q)
            + self.q * log(self.gamma)
            + (self.p * x - 1) * log(m_x)
            + (self.p * x) * log(x)
            - (self.p * x + self.q) * log(x * m_x + self.gamma)
        )

        return -negative_log_likelihood_values.sum()

# Extract data for model following notation from Fader/Hardie

non_zero_slice = (rfm.frequency != 0) & (rfm.monetary_value != 0)

N = rfm.loc[non_zero_slice].shape[0] # number of customers
x = rfm.loc[non_zero_slice].frequency.values
m_x = rfm.loc[non_zero_slice].monetary_value.values
n_draws = 2000

with pm.Model() as gg_model:
    # Uninformative priors on model hyperparameters see Polson and Scott 
    # https://projecteuclid.org/download/pdfview_1/euclid.ba/1354024466
    p = pm.HalfCauchy('p', beta=2)
    q = pm.HalfCauchy('q', beta=2)
    gamma = pm.HalfCauchy('gamma', beta=2)

    v = pm.Gamma('v', alpha=q, beta=gamma, shape=N, testval=np.random.rand(N))
    z_bar = pm.Gamma('z_bar', alpha=p * x, beta=v * x, shape=N, testval=np.random.rand(N))
    
    # Custom distribution for GammaGamma likelihood function
    loglikelihood = GammaGamma("loglikelihood", p=p, q=q, gamma=gamma, observed={'x': x, 'm_x': m_x})
    
with gg_model:
    trace = pm.sample(n_draws, tune=2000, init=None, cores=8, chains=2, pickle_backend='dill') 

One further question that comes to mind is, is it possible to apply such model to customers with a single or no purchases?

1 Like

Maybe I’m misunderstanding the dataset, but users with a single purchase should just have 1’s and be fine? The only people I would think get thrown out are those customer id’s with zero purchases?

I looked some more at the lifetimes package, it seems to treat customers as either being “alive” when they’re buying but may “die” at some later point when they stop purchasing. In this case, would it even make sense to model the lifetime value of customers that have never purchased? Seems this group of people isn’t being captured by the model but I could be mistaken.

Maybe I’m misunderstanding the dataset, but users with a single purchase should just have 1’s and be fine? The only people I would think get thrown out are those customer id’s with zero purchases?

In the context of CLV and according to the literature, the purchase frequency is computed as the number of purchases - 1 (that is the reason why I mentioned that the first transaction is disregarded). Moreover, this is the same approach that is used in the lifetimes library [source]:

By default the first transaction is not included 
while calculating frequency and monetary_value

It seems, as you correctly noted, that it is not possible to fit the model on the population of customers that have a number of purchases <= 1. But the reason why it could be useful to estimate lifetime value for such customers is to model their future behaviour.

One more technical question about pymc3 models.
I noticed that sampling with tune=1000 and draws=2000 causes the model to diverge, Is there a rule of thumb when choosing these parameters?