Non-centered hierarchical model with constraints

I’m trying to fit a hierarchical model to some data where some of the variables are constrained
for physical reasons to be in an interval. The model (only the part that is relevant here)
in centered way reads something like this:

model = pm.Model()
with model:
    muO = pm.Uniform('muO',lower=8.40,upper=9.1)
    sigmaO = pm.HalfNormal('sigmaO',sd=1.)

    bounded_normal = pm.Bound(pm.Normal,lower=8.40,upper=9.1)
    AO = bounded_normal('AO', mu=muO, sd=sigmaO, shape=n)

This makes AO to be bounded variables in the physically relevant space. However, I’m facing
a problem if I want to write this in non-centered way to check whether I have the well-known
funnel (of hell, to be more dramatic as suggested by @twiecki :))

model = pm.Model()
with model:
    muO = pm.Uniform('muO',lower=8.40,upper=9.1)
    sigmaO = pm.HalfNormal('sigmaO',sd=1.)

    AO_offset = pm.Normal('AO_offset', mu=0, sd=1, shape=n)
    AO = pm.Deterministic('AO', muO + AO_offset*sigmaO)

This model does not allow to easily incorporate bounds for AO. Any idea of how to
do it efficiently?


1 Like

hmmm, not sure what is the best way to deal with this, but I would model A on the real line and then transfer it into a bounded Deterministic:

a, b = 8.40, 9.1
with model:
    muO = pm.Normal('muO',0, 100)
    sigmaO = pm.HalfNormal('sigmaO',sd=1.)

    AO_offset = pm.Normal('AO_offset', mu=0, sd=1, shape=n)
    Am = muO + AO_offset*sigmaO
    AO = pm.Deterministic('AO', (b - a) * tt.nnet.sigmoid(Am) + a)

Also, you don’t actually need to reparameterize to find out if you are in trouble. If everything is fine you shouldn’t see any warnings, get gelman_rubin close to 1 and a reasonable effective sample size. On top of that you can make scatter plots, especially between scale and location parameters and see if there are correlations for funnels.

Neat trick. This will make my hyperparameters slightly more difficult to understand but I think this can be done.

The point of this is that, even though the gelman_rubin test is close to 1, I’m watching some indications in sigmaO of lack of convergence, which is exactly the behavior found by @twiecki in

Would be cool if the Interval transform could be used here. It should definitely be possible but the API is not really tuned for it.

Actually, there is:

I can apply Bound to a distribution, but not to a Deterministic, right?

You are correct. You can bound a deterministic with tt.clip but it mess up the gradient at the boundaries. Other option is to used tt.switch for the value outside of your bound.

A first preliminary test with tt.clip looks promising. Thanks!

I encountered the same problem. Would it be possible to set the bounds on the offset variable? E.g., for variable a that is bounded between 0 and 1, could you do:

mu_a = pm.Bound(pm.Normal, lower=0, upper=1)('mu_a', mu=0.5, sd=0.5)
sd_a = pm.HalfNormal('sd_a', sd=0.5)
offset_a = pm.Bound(pm.Normal, lower=-mu_a/sd_a, upper=(1-mu_a)/sd_a)(
                    'offset_a', mu=0, sd=1, shape=n)

a = pm.Deterministic('a', mu_a + offset_a * sd_a)


You can do it this way, but I wont recommend it - you should model the random variable on the Real line and use logistic transformation instead.

I try to model standard deviation, which obviously must be positive, but in the centered parameterization I get a handful of divergences. I tried to naively re-parameterize with sigma = pm.Deterministic('sigma', global_sigma + offset * global_sigma_sd), and got a Bad Initial Energy error when sampling.
then in order to put a lower bound on the deterministic variable I tried with sigma.clip(0.0, np.inf) and tt.switch(sigma<0, 0.0, sigma), but again I always get a Bad Initial Energy error when sampling.
When instead of a 0-lower bound I tried 1e-15, with clip I get:

ValueError: Mass matrix contains zeros on the diagonal.

and with tt.switch:

The chain contains only diverging samples. The model is probably misspecified.

I tested with a latent variable:

offset = pm.Normal('offset', mu=0, sigma=1.0)
b_offset = pm.Bound(pm.Normal, lower=0.0)('bound_offset', mu=global_sigma + offset, sigma=1)
sigma = pm.Deterministic('sigma', b_offset * global_sigma_sd)

which gave no divergences and took 2-4 times longer to sample, but a + b * c != (a + b) * c, so that doesn’t seem like a good idea.
How can this be solved? Is the best option to transform the data, e.g. square demeaned data and log-transform and then model mu instead of sigma?
This is related to this ongoing struggle.