Derivative warning while using Hamiltonian Monte Carlo


Dear all,
I am using external functions to calculate my likelihood, with the help of Black-box example from PyMC3 documentation:

Following is my likelihood function:

def log_likelihood(theta, data, sigma, x, y, z):
    fm = FM.FUN1(theta, x, y, z) # External function to calculate data

    # log likelihood
    n = len(fm.flatten())
    covd=(1/sigma**2) * np.identity(n)
    return,covd), data_residuals)

When I run HMC sampler through PyMC3, I got following warning:

warnings.warn("Derivative calculation did not converge: setting flat derivative.")

And the code executes prematurely with following error:

raise ValueError('Too few elements for interval calculation')

ValueError: Too few elements for interval calculation

Kindly suggest how can be overcome from these issues.

Thank You


I think the first warning "Derivative calculation did not converge: setting flat derivative." means the library can’t get the gradient from your custom theano operation for the log-likelihood.

  • Does your theano operation have a correctly defined grad method?
  • Does the grad method returns the gradient-vector product in the correct format?
  • Is your definition of the gradient well defined for all the points you sampled? (Do you have division by zero, taking square root of negative number, or taking logarithm of a negative number/zero?)
  • Have you tested the theano operation that you defined?

theano has a finite difference checker:

If you value your sanity, always check that the gradient is ok:

theano.tests.unittest_tools.verify_grad(MuFromTheta(), [np.array(0.2)])

Since the library can’t get the gradient from your log-likelihood, the library proceeds to calculate the gradient with numerical differentiation.

I think the Too few elements for interval calculation means:

  1. It is using numerical differentiation.
  2. The library can’t get enough number of well defined point around the current point to do numerical differentiation. For the simplest numerical estimation of first derivative, the program needs two points to draw a line and to estimate the slope. The two points can’t be nan.