Hi,

I’m trying to implement the BG/NBD model from Fader et al 2005 (“Counting Your Customers” the Easy Way: An Alternative to the Pareto/NBD Model, link).

I *think* I’ve implemented the log-likelihood function correctly, but I’m having convergence issues that lead me to believe something is misspecified.

This is my first time implementing something like this myself, so I’d appreciate any help or advice out there. Thanks in advance!

Claus

(This is partially based on this post and this PyData tutorial related to the Pareto/NBD model.)

I’m using the CDNow dataset of purchasing data via the `lifetimes`

library, which is used in the paper, so I’m pretty confident this should be applicable to a BG/NBD model.

```
import pandas as pd
from lifetimes.utils import summary_data_from_transaction_data
from lifetimes.datasets import load_dataset
cdnow_transactions = load_dataset(
'CDNOW_sample.txt',
header=None,
delim_whitespace=True,
names=['customer_id', 'customer_index', 'date', 'quantity', 'amount'],
converters={'date': lambda x: pd.to_datetime(x, format="%Y%m%d")}
)
rfm = summary_data_from_transaction_data(
cdnow_transactions,
'customer_id',
'date',
observation_period_end=pd.to_datetime('1997-09-30'),
freq='W'
)
# Extract data for model following notation from Fader/Hardie
N = rfm.shape[0] # number of customers
x = rfm['frequency'].values
t_x = rfm['recency'].values
T = rfm['T'].values
```

Per the paper, this is the likelihood function:

L(\lambda, p | X=x, T) = (1-p)^x \lambda^x e^{-\lambda T} + \delta_{x>0}p(1-p)^{x-1}\lambda^x e^{-\lambda t_x}

Which I’ve transformed into this log-likelihood:

LL(\lambda, p | X=x, T) = xlog(1-p) + xlog(\lambda) -\lambda T \\+ \delta_{x>0}( log(p)+(x-1)log(1-p)+xlog(\lambda) -\lambda t_x )

And then implemented in PyMC3 like so:

(*UPDATED* per @junpenglao’s suggestion)

```
import pymc3 as pm
import theano.tensor as tt
class BGNBD(pm.Continuous):
"""
Custom distribution class for BG/NBD likelihood.
"""
def __init__(self, lambda_, p, *args, **kwargs):
super(BGNBD, self).__init__(*args, **kwargs)
self.lambda_ = lambda_
self.p = p
def logp(self, x, t_x, T):
"""
Loglikelihood function for an individual customer's purchasing rate \lambda
and probability \p of becoming inactive given their
- frequency (x)
- recency (t_x)
- time since first purchase (T)
"""
lambda_ = self.lambda_
p = self.p
log_lambda = tt.log(lambda_)
log_p = tt.log(p)
log_1_p = tt.log(1-p)
delta_x = tt.where(x>0, 1, 0)
A1 = x*log_1_p + x*log_lambda - lambda_*T
A2 = delta_x * (log_p + (x-1)*log_1_p + x*log_lambda - lambda_*t_x)
return tt.log(tt.exp(A1)+tt.exp(A2))
```

I then specified the model with the following priors:

```
bgnbd_model = pm.Model()
with bgnbd_model:
# Hyperparameters for lambda, i.e. rate of number of transactions
r = pm.HalfCauchy('r', beta=2)
alpha = pm.HalfCauchy('alpha', beta=2)
# Gamma prior on purchasing rate parameter lambda
# Note: per Fader et al, we use r and alpha to parameterize Gamma(lambda)
lambda_ = pm.Gamma('lambda', alpha=r, beta=alpha,
shape=N, testval=np.random.rand(N))
# Hyperparameters for p, i.e. probabibility of dropout
a = pm.HalfCauchy('a', beta=2)
b = pm.HalfCauchy('b', beta=2)
# Gamma prior on probability of dropout p
p = pm.Beta('p', alpha=a, beta=b, shape=N, testval=np.random.rand(N))
# Custom distribution for BG-NBD likelihood function
loglikelihood = BGNBD("loglikelihood", p=p, lambda_=lambda_,
observed={'x': x, 't_x': t_x, 'T': T})
```

I sampled this with anywhere from 2000 to 15,000 draws with various levels of tuning etc, e.g.:

```
# from Austin Rochford
SEED = 22 # from random.org, for reproducibility
SAMPLE_KWARGS = {
'chains': 2,
'n_jobs': 2,
'draws': 2000,
'tune': 1000,
'target_accept': 0.8,
'random_seed': [
SEED,
SEED + 1
]
}
with bgnbd_model:
trace = pm.sample(**SAMPLE_KWARGS)
```

But encounter a lot of divergence and effective sample size issues:

```
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (3 chains in 4 jobs)
NUTS: [p_logodds__, b_log__, a_log__, lambda_log__, alpha_log__, r_log__]
100%|██████████| 6000/6000 [04:48<00:00, 20.78it/s]
There were 1 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.7110400521268769, but should be close to 0.8. Try to increase the number of tuning steps.
The estimated number of effective samples is smaller than 200 for some parameters.
```