Bad initial energy with Normal prior but not Gamma

Preface: I’ve read some other posts about dealing with the “bad initial energy” error, and I found the response on this one helpful. By adding keyword init=None or init='adapt_diag', I can get my model to run.

My question is about two similar models - one throws the error, but the other runs fine.

Here’s my model with two possible versions of a prior for beta:

with pm.Model() as model:
    alpha = pm.Exponential('alpha', lam=1)
    beta = pm.Gamma('beta', mu=eo_beta_mean, sd=eo_beta_sd)
    #beta = pm.Normal('beta', mu=eo_beta_mean, sd=eo_beta_sd)
    mu = pm.Deterministic('mu', alpha * tt.exp(-beta * obs_laf))
    p = pm.Beta('p', 0.1, 0.1, testval=0.5)
    bkgs_at_laf = pm.NegativeBinomial('bkgs_at_laf', mu, p, observed=obs_bkgs_at_laf)    
    trace = pm.sample(500, init=None)

If I run this with the Gamma prior on beta, things are great. No complaints. But if I switch to the Normal prior on beta with the same parameters, I get the bad initial energy error if I don’t use the init keyword.

While troubleshooting, I notice slightly different behavior between the Gamma and the Normal. I’m wondering if this is expected behavior. (I’m enough of a rookie with this Bayesian stuff that there might be an obvious reason for this…)

With the Gamma prior, model.test_point shows that the beta test point has been log-transformed: 'beta_log__': array(-6.292401784247364).

But with the Normal prior, model.test_point shows that the beta test point has not been log-transformed: 'beta': array(0.0018503105590062113).

Reading the other posts about how the jitter can cause problems with small initial values, I see why the small value of the Normal test point is problematic. But why isn’t it being log-transformed also? Wouldn’t that solve the problem?

The log transformation is to turn the domain of Gamma [0, inf] to the real line [-inf, inf]. The main inference we have (i.e., NUTS and ADVI) are supported on the R domain, so by default we transform any bounded variables to the real line. The Normal distribution already has the support on [-inf, inf], thus no transformation is needed.

If you have jitter causing the problem, the recommended solution is doing trace = pm.sample(init='adapt_diag'), as you likely get sub-optimial result without any initialization.

Ah, I see - that makes complete sense. Thanks for the quick reply.

And yes, changing the initialization solves my problem. Appreciate the help.