Non-centered parametrisation of bounded variables


I am trying to estimate estimate parameters for a hierarchical model with two parameters (alpha and beta) for n subjects.
I have a ground truth (simulated) dataset with alphas ~ Uniform(0,1) and betas ~ Gamma(5,1) and I know that alpha’s range is [0,1], while beta’s range is [0,+inf].

Centered parametrisation of the parameters was giving poor estimates so I’ve been looking into non-centered parametrisation.

I’ve found an example Stan implementation of a model similar to mine, which uses stan’s constrained real variables (alpha_b, beta_a, beta_b > 0, beta > 0) and Phi_approx to restrict alphas to [0,1].
Alpha’s non-centered parametrisation has location ~ Normal(0,3) and scale ~ Cauchy(0,5), while beta’s parametrisation is drawn from a Gamma distribution with shape and scale both drawn from Normal(1,5).

I’ve tried to re-implement this in pymc with the following code but the estimates are not really good.

    # alpha priors
    alpha_a = pm.Normal('alpha_a', mu=0, sigma=3)
    alpha_b = pm.Cauchy('alpha_b', alpha=0, beta=5)
    alpha_raw = pm.Normal('alpha_raw', mu=0, sigma=1, shape=n_subj)

    # beta priors
    beta_a = pm.Normal('beta_a', mu=1, sigma=5)
    beta_b = pm.Normal('beta_b', mu=1, sigma=5)

    # non-centered parametrisation
    alpha = pm.Deterministic('alpha', pm.math.invprobit(alpha_a + alpha_b * alpha_raw))
    beta = pm.Gamma('beta', beta_a, beta_b, shape=n_subj)

The estimated alphas and betas are all within a very small range (alphas are tightly packed around 0.5, betas all around 3.5).

Do this non-centered parametrisations make sense or are there better ways to do this for bounded variables?

I’ve tried to restrict the problem to only estimate alpha using a different non-centered parametrisation that have been employed in similar problems and similar models (in particular in the hBayesDM library):

alpha_a ~ Normal(0,1)
alpha_b ~ HalfNormal(0,0.2)
alpha_raw ~ Normal(0,1)
alpha = InverseProbit(alpha_a + alpha_b * alpha_raw)

Despite the changes, I’m still having issues with this parametrisation.
Compared to the original one there’s no divergences anymore but the sampler consistently estimates alpha_a to be within [-3,-2], which causes the InverseProbit to push the values of alpha very close to 0.
I tried to vary the priors for alpha_a and alpha_b but the sampler consistently pushes alpha_a to negative values.

I’ve included an example of the code in this notebook

I’m not too sure on how to interpret the pair plots with alpha and alpha_a and alpha_b.
Do these make sense or is there some (maybe obvious) problem with them?


I’m wondering if anyone could provide any help with this, in particular with the following questions

  • should the CustomDist logp return a single value or the likelihood for each subject? In the linked notebook, the logp function computes the choice probabilities for each subject using pytensor.scan and then it returns a vector of lenght n_subj with the loglikelihood for each subject.
  • are the choices and the outcomes passed in the correct way to the logp function? They are merged together in a single observed tensor which is later “unpacked” inside the logp function.
  • I’ve attached the pair plots between the non-centered parametrisation for the location (alpha_a) and scale (alpha_b) parameters and alpha itself. I’m not sure on how to interpret these plots and whether there is something obviously wrong (and hopefully a fix for it).

I believe that the logp function is correct, I have tested it and the resulting log-likelihoods are the same of a non-vectorised implementation.

Any help is greatly appreciated :slight_smile:


Traces where alpha_a becomes very negative resulting in alpha to be close to 0.

location alpha_a vs alpha

scale alpha_b vs alpha

Doesn’t matter, except for model comparison, where you would want the probability per subject or even per trial

This should be fine.

Your model could still be unidentified or have a bug. You don’t have that many subjects, does it get better if you double them? Also, maybe try to se the alpha_a to its true value? Does that at least converge to the correct estimates? If not, I suspect an error in the logp.

Hi Ricardo,

I’ve tested the models with 10, 50, 500 subjects and with 50, 200, 500 trials and the behaviour is always the same.

I changed the ground truth alphas to be sampled from a Normal(0.6, 0.1) and set alpha_a = 0.6.
In this case the estimates for alpha_b shift in the [2,2.5] range.

The model should be able to work correctly with this task/setup (my model is almost a 1:1 conversion of a stan model which works fine on the same task).
At this point it must be a bug in the logp but I’m not sure what it could be - the logp gives the same NLLs of a non-vectorised implementation when run “deterministically” with fixed parameters.

Thanks for the help!