Getting 'Bad initial energy: inf' when trying to sample simple model


#1

I’m a PyMC3 (and Bayesian coding) novice, trying to sample what should be a very simple model. Specifically, I’m trying to model a straight line where there are two sources contributing to the observed scatter around the relation: first, the measurement uncertainties (which I have for each point); and second, the intrinsic scatter around the relation.

However, whenever I try to sample this, the NUTS sampler crashes with the error: ‘ValueError: Bad initial energy: inf. The model might be misspecified.’

Oddly, the pm.Metropolis sampler does work, but is way, way slower (needing >250000 draws to get comfortably past autocorrelation). I struggle to see how a model this simple could be so severely misspecified to crash the sampler?

Any help much appreciated! Complete code as follows:

# Import smorgasbord
import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm
import theano.tensor as tt

# Data points (hardcoded for this example)
x_points = np.array([0.48574, 0.54206, 0.50821, 0.44593, 0.40873, 0.20569, 0.08038, 0.30518, 0.0117, 0.29974, 0.28977, 0.32806, 0.32496, 0.38357, 0.45617, 0.50823, 2.51709, 1.98823, 1.78779, 1.58633, 1.23946, 1.25441, 1.26188, 1.12205, 0.94541, 1.19891, 0.76849, 0.72581, 1.46885, 0.66112, 0.74459, 0.75426, 2.10072, 0.68885, 0.67268, 0.61371, 2.45453, 1.6277, 2.37253, 1.13018, 2.12139, 1.59796, 1.58462, 0.62712, 1.34953, 1.18484, 1.25131, 1.22118, 0.76266, 1.13954, 0.76647, 1.11719, 0.67664, 0.98378, 1.4421, 1.09343])
y_points = np.array([8.62361, 8.58473, 8.67064, 8.66751, 8.61314, 8.62788, 8.66276, 8.62828, 8.72265, 8.67101, 8.65285, 8.58985, 8.68368, 8.63772, 8.61623, 8.57461, 8.3868, 8.35645, 8.37346, 8.32565, 8.48552, 8.43744, 8.41082, 8.45783, 8.47807, 8.29072, 8.54428, 8.65165, 8.4924, 8.57238, 8.57266, 8.58867, 8.35816, 8.62443, 8.5474, 8.56266, 8.25485, 8.38961, 8.30199, 8.46212, 8.4369, 8.37571, 8.25825, 8.58458, 8.37271, 8.35321, 8.36903, 8.33309, 8.62743, 8.36992, 8.61193, 8.38307, 8.61207, 8.54401, 8.42055, 8.53708])
y_uncs = np.array([0.0248, 0.03579, 0.02606, 0.02971, 0.03022, 0.02502, 0.02804, 0.03037, 0.03231, 0.03425, 0.02947, 0.02574, 0.03058, 0.04417, 0.03605, 0.03535, 0.02104, 0.06762, 0.06982, 0.02122, 0.0294, 0.0323, 0.03731, 0.02887, 0.0274, 1.22413, 0.03399, 0.01714, 0.07119, 0.03327, 0.03677, 0.02801, 0.0187, 0.02771, 0.02941, 0.02072, 0.05694, 0.03161, 0.03679, 0.01321, 0.02408, 0.01769, 0.07938, 0.02357, 0.08276, 0.01337, 0.01767, 0.06916, 0.01688, 0.01702, 0.0297, 0.01779, 0.01614, 0.04123, 0.0342, 0.02855])

# Initially, use simple least-squares fit to inform priors and test values
leastsq_fit = np.polyfit(x_points, y_points, 1, w=y_uncs**-1)

# Set up PyMC3 model to model relation
model = pm.Model()
with model:

    # Specify prior for intercept
    intercept = pm.Normal('Intercept', mu=leastsq_fit[1], sd=y_points.std(), testval=leastsq_fit[1])

    # Specify prior for gradient
    gradient = pm.Normal('Gradient', mu=leastsq_fit[0], sd=leastsq_fit[0], testval=leastsq_fit[0])

    # Determine expected value
    z_expected = ( gradient * x_points ) + intercept

    # Calculate residuals, to inform psi
    residuals = z_expected - y_points

    # Specify prior for psi, the scale of the additional scatter arising from intrinsic variation
    psi = pm.HalfNormal('Psi', sd=tt.mean(tt.abs_(residuals)), testval=tt.mean(tt.abs_(residuals)))

    # Determine combined scatter by adding psi and uncertainties in quadrature
    sigma = ( psi**2.0 + y_uncs**2.0 )**0.5

    # Specify likelihood distribution of points
    z_likelihood = pm.Normal('z_likelihood', mu=z_expected, sd=sigma, observed=y_points)

    # Sample posterior distriution
    trace = pm.sample(draws=5000, njobs=4, tune=1000)
    #trace = pm.sample(step=pm.Metropolis(), draws=100000, njobs=4, tune=5000)

#2

Hi @Stargrazer82301, this is actually one of our FAQs. You can have a look at the answer here:

For example, in your case, running:

for RV in model.basic_RVs:
    print(RV.name, RV.logp(model.test_point))

gives:

Intercept 1.1370228148748596
Gradient -inf
Psi_log__ -0.7257913526447277
z_likelihood 73.49976264100106

And looking closer at gradient:

gradient = pm.Normal('Gradient', mu=leastsq_fit[0], sd=leastsq_fit[0], testval=leastsq_fit[0])

You can see that sd is defined as leastsq_fit[0], which is a negative value.

I am actually surprise that Metropolis can sample - it should not be the case…


How to debug Bad initial energy: nan?
Multinomial hierarchical regression with multiple observations per group ("Bad energy issue")
Limit or prevent unrealistic output of neural network
#3

Thank you much for your very helpful (and quick) response! Glad to see it was me being daft; I’m impressed by how robust the Metropolis sampler turns out to be.


#4

I tried the code in place of pm.sample(), it seems to only print out the initial values. So I gather this error only refers to the initial values.

In my case, none of the logp’s came out -inf, but I still get this error when using gpu (not on cpu).


#5

looks like just a non-sense error message. It was a gpu precision problem.