I have a three level hierarchical model that has:

- hyper-hyper parameters (for populations)
- hyper parameters (for sub-populations)
- measurements

The measurements are N(mu, sigma) where the mu and sigma are sampled from the hyper parameters.

The hyper-parameters for each sub-population are mu ~ N(hyper-mu, hyper-sigma) and sigma ~ Gamma(mu=hyper-sd-mean, sigma=hyper-sd-sd).

I have been having troubles with divergence, so first I reformulated the mus as recommended using

```
offset=pm.Normal('offset', mu=0, sd=1)
mu = pm.Deterministic('mu', hyper_mu + offset * hyper_sigma)
pm.Normal('measurement', mu=mu, sd=sd)
```

That worked, but didn’t solve my divergence problem. So I moved to the next step, and reformulated the `sd`

:

```
sigma_offset = pm.Normal('sigma offset', mu=0, sd=1)
sigma = pm.Deterministic('sigma', largest(hyper_sigma_mu + sigma_offset * hyper_sigma_sd, 0))
```

But when I added this my chains started failing, with this error:

```
ValueError: Bad initial energy: inf. The model might be misspecified.
```

It’s quite possible either (a) there’s a deep reason behind this which I don’t know or (b) I have done something stupid (like a typo).

Can anyone suggest either what is wrong with reformulating the standard deviation hyper-parameters like this or how to find whatever bug there is? I suppose the latter would be to somehow look at the probability function so I can see what is wrong with my code?

I’m lost here and would be very grateful for any guidance.