NotImplementedError for HamiltonianMC and NUTS sampler

Dear all

I try to estimate a set of subsurface model parameters (say, H1, H2, H3, R1, R2, R3) from earthquake ground motion data. I can predict the data from the model using a numerical method predictRF(H1, H2, H3, R1, R2, R3), that takes about 0.1 seconds per call. However, the Baysian inference takes overly long. Trying different samplers I run into NotImplementedErrors, which I would like to understand and ideally get rid of.

The following setup runs for 50 hours:

with open(infile, 'rb') as f:
    rf_obs = pickle.load(f)

with pm.Model() as model:

    R1 = pm.TruncatedNormal('Ruc', mu=1.75, sigma=0.1, lower=1.42)
    R2 = pm.TruncatedNormal('Rlvl', mu=1.75, sigma=0.1, lower=1.42)
    R3 = pm.TruncatedNormal('Rloc', mu=1.75, sigma=0.1, lower=1.42)

    H1 = pm.TruncatedNormal('Huc', mu=35, sigma=10, lower=0)
    H2 = pm.TruncatedNormal('Hlvl', mu=5, sigma=1, lower=0)
    H3 = pm.TruncatedNormal('Haml', mu=5, sigma=5, lower=0)

    rf_pred = predictRF(predictRF(H1, H2, H3, R1, R2, R3))

    L1 = pm.Laplace('L1', mu=rf_pred, b=1, observed=rf_obs)

    inference = pm.sample(2000, tune=1500, cores=12,
                          return_inferencedata=True)

I need to increase the number of samples to arrive at a stable parameter estimate, but for any initialization method, I get

Initializing NUTS failed. Falling back to elementwise auto-assignment.

Inspecting inference suggests that the 50 hour runtime is because most steps are discarded. Setting more informative priors and/or a start point close to a parameter estimate I got from simulated annealing doesn’t help. So I wanted to try different samplers:

    step=pm.Metropolis()
    inference = pm.sample(2000, tune=1500, cores=12,
                          step=step
                          return_inferencedata=True)ut things 

works, but still takes 20 hours. When I do

    step = pm.HamiltonianMC()

I end up with

  File "(...)/theano/graph/op.py", line 299, in grad
    raise NotImplementedError()
NotImplementedError

but I wonder what is not implemented here? Interestingly enough, I get the same error with

    step = pm.NUTS()

even though I assumed this would dub the default behavior, that worked (see first example).

Could you help me with getting the HamiltonianMC and/or NUTS sampler working? Which parameters do you think might help to get a better acceptance rate and speed up things?

Thanks in adavance!
wasja

You can’t use HMC nor NUTS because the grad is not implemented. From your code this must be due to predictRF. I’d recommend reading Using a “black box” likelihood function — PyMC documentation or its numpy version (still open PR but basically finished) Using a “black box” likelihood function (numpy) — PyMC documentation. The numpy version might be better introduction if you are not familiar with cython

1 Like

Yes, predictRF() is a wrapped fortran code. Finding the gradient of this function is a common issue and matter of ongoing research. Thanks for pointing me to the right documentation, these articles are very helpful. I will come back here to report on any success or failure.

1 Like

OriolAbrils solution worked, however the combination of a finite difference gradient and the NUTS/HMC sampler in my case is still ~5x slower than the Metropolis sampler (without a gradient). But the endeavor was totally worth it and I learned a lot. Thanks for your quick help!

A little side note: I found the warning

Initializing NUTS failed. Falling back to elementwise auto-assignment.

somewhat misleading, as it for me implied that “elementwise auto-assignment” was a way of using NUTS without initializing. Additionally, it might be worth catching theano’s NotImplementedError() and supply it with a message to the user that this is or could be about a missing evaluation of the gradient of the likelihood function.

2 Likes

Do note that sampling speed is not necessarily the most important measure. It may be that NUTS takes 5 times as long than Metropolis but obtains much more effective sample size (less autocorrelated samples), meaning that you may get the same amount of information from taking less draws. This is often the case in comparison with Metropolis

1 Like