Debugging Custom likelihood


#1
# import theano 
from pymc3.math import exp, log

class ParetoNBD(pm.Continuous):
    """
    Custom distribution class for Pareto/NBD likelihood.
    """
    
    def __init__(self, lambda_, mu, *args, **kwargs):
        super(ParetoNBD, self).__init__(*args, **kwargs)
        self.lambda_ = lambda_
        self.mu = mu
        
    def logp(self, x, t_x, T):
        """
        Loglikelihood function for and indvidual customer's purchasing rate \lambda
        and lifetime \mu given their frequency, recency and time since first purchase.
        """
        
        log_lambda = log(self.lambda_)
        log_mu = log(self.mu)
        mu_plus_lambda = self.lambda_ + self.mu
        log_mu_plus_lambda = log(mu_plus_lambda)
        
        p_1 = x * log_lambda + log_mu - log_mu_plus_lambda - t_x * mu_plus_lambda
        p_2 = (x + 1) * log_lambda - log_mu_plus_lambda - T * mu_plus_lambda
        
        return log(exp(p_1) + exp(p_2))

This is a custom distribution I have been using from this notebook.
https://github.com/benvandyke/pydata-seattle-2017/blob/master/lifetime-value/pareto-nbd.ipynb

I am running into issues when I try to run this piece of work with a different dataset. I have some nan’s in mu when sampling and that is causing the following errors.

The derivative of RV `lambda_log__`.ravel()[5921] is zero.
The derivative of RV `lambda_log__`.ravel()[5922] is zero.
The derivative of RV `beta_log__`.ravel()[0] is zero.
The derivative of RV `s_log__`.ravel()[0] is zero.
The derivative of RV `alpha_log__`.ravel()[0] is zero.
The derivative of RV `r_log__`.ravel()[0] is zero.

A couple of questions:

  • How can I debug why some of my mu values are nan? and is there a way to overcome that?
  • Is there a way for me to use logsumexp for log(exp(p_1) + exp(p_2))?

Help is greatly appreciated!!


#2

If you follow the step in Frequently Asked Questions and there is no problem of the model test point, the derivative zero error is usually when you have prior being too flat - changing them to more informative prior usually works

There is a logsumexp function in pymc3.


#3

Thanks!! Informative prior helped solve NaN problem.