How have you approached the problem for larger sets of customers, say 1 million or more?
For large datasets, BG/NBD and other BTYD models like Pareto/NBD are well-behaved for fitting point estimates of parameters via Maximum a Posteriori (MAP), due to the unique, underlying conjugate prior assumptions (don’t use MAP for any other Bayesian model, though!) You’d lose credibility intervals for parameters and predictions with MAP, but these models have a dimension for each customer, and RAM limitations are a consideration. On that note, pymc-marketing also has an open PR right now for ADVI, which could enable minibatch model fits on GPUs.
@Keith_Min hierarchical support is definitely something I’m looking to add for the Gamma-Gamma spend model later this year, but there are additional considerations for BG/NBD and other transaction models. They have strong population assumptions which would be violated if segmenting on spending behavior alone. However, it could be viable for geographical regions.
Do you train different models for different types of customers - say new vs existing vs lapsed, or by region or by line-of-business, retail banking vs online banking vs drive-thru banking?
BG/NBD works best for retail transactions; don’t use it for subscription renewals. Lapsed customers are unobservable in the case of retail, which is precisely what these models were built to estimate. Your banking example would be a good application for static covariates, which is currently supported by the Pareto/NBD model.