Bad initial energy: inf or nan

Hello, by running the code below multiple times, all things fixed, it can either compile successfully (mostly when we limit the size of train), or sometimes it triggers a ValueError: Bad initial energy: inf, or a ValueError: Bad initial energy: nan.
Here is the code:

train = df.head(100)

with pm.Model() as model:
    BoundedNormal = pm.Bound(pm.Normal, lower=-10, upper=10)

    theta_sd = pm.InverseGamma('theta_sd', alpha=1 + 50, beta=2 + 50)
    theta_not_centered = BoundedNormal('theta_not_centered', mu=0, sd=theta_sd,
    theta = pm.Deterministic('theta', theta_not_centered - tt.mean(theta_not_centered))

    alpha = pm.Lognormal('alpha', mu=0.5, sd=1, shape=100)

    beta_not_centered = BoundedNormal('beta_not_centered', mu=0, sd=1,
    beta = pm.Deterministic('beta', beta_not_centered - tt.mean(beta_not_centered))

    lesson_not_centered = BoundedNormal('lesson_not_centered', mu=0, sd=1, shape=400)
    lesson = pm.Deterministic('lesson', lesson_not_centered - tt.mean(lesson_not_centered))

    intercept = BoundedNormal('intercept', mu=0, sd=1)

    theta_for_each_rows = theta[list(train['student_code'].values)]
    alpha_for_each_rows = alpha[list(train['exercise_with_level_code'].values)]
    beta_for_each_rows = beta[list(train['exercise_with_level_code'].values)]
    lesson_for_each_rows = lesson[list(train['lesson_code'].values)]

    linear_component = alpha_for_each_rows * (theta_for_each_rows - beta_for_each_rows) + lesson_for_each_rows + intercept

    proba_without_left_asymptot = pm.Deterministic('proba_without_left_asymptot', pm.math.sigmoid(linear_component))
    proba = pm.Deterministic('proba', train['min_success_rate'].values + (
            1 - train['min_success_rate'].values) * proba_without_left_asymptot)

    result = pm.Bernoulli('result', p=proba, observed=train['correctness'])

    trace = pm.sample()

Post on similar subject didn’t help us debug it. Especially, running

for RV in model.basic_RVs:
    print(, RV.logp(model.test_point))

didn’t show unusual values.

Thanks for your help!

This issue has been noticed a couple of times (eg. Limit or prevent unrealistic output of neural network). The reason is that jitter sometimes makes the initial value goes out of the support of the logp. We are working to make this more robust.

For now, you can set init=adapt_diag to avoid this problem.