Why or why not to sample `global_param_sigma` in hierarchical models

From what I understand, this setup is quite typical for a hierarchical model:

with pm.Model() as model:
    global_param_mu = pm.Normal(‘global_param_mu, 0.0, 1.0)
    global_param_sigma = pm.HalfNormal(‘global_param_sigma’, 1.0)

    local_param = pm.Normal(‘local_param’, global_param_mu, global_param_sigma)

    mu = pm.Deterministic(‘mu’, x*local_param[local_idx])
    sigma = pm.HalfNormal(‘sigma’, 1.0)

    likelihood = pm.Normal(‘likelihood’, mu, sigma, observed = y)

And I’m wondering about global_param_sigma… I discovered that when I drop it and just enter 1.0 as the value for local_param sigma, my model samples much faster and give the same good result.

Since my data is standardized, I expect everything to be distributed normally with 0.0 mean and 1.0 stdev. So I thought this would be ok.

But still, I would like to here someone else’s thoughts of why or why not to sample global_param_sigma

I’m not expert, but:

There’s an issue with your model in that sizes or dimensions are not specified. local_param seems to be indexed by local_idx, but it’s not at all clear how many of them there are.

If your data is standardized, that means that the overall standard deviation (of the observed data y) should be 1.

This standard deviation is calculated on the data and the model says it arises from three different sources: the variance of y around mu (here called sigma), the variability in mu caused by x, and the vaiability in the modulation of the effect of x that comes from variability in local_param (called global_param_sigma).

Together these different sources of variability in y have to sum to be 1, but since variability is positive, they cannot each themselves be 1. In fact, what you are doing by having the two parameters, sigma and global_param_sigma is partitioning the variance of y into three variances as described above. Properly normalized, they will sum to the total variance which is 1.

Note also, that they will not sum to exactly 1. It is only the variance of the standardized data that is 1. The variance of the population is a parameter with a distribution. This distribution can be found by addng up the (properly normalized) (samples of the) components of the variance specified in the model as I discussed above.

I hope this helps. I worry I was not clear.
Opher

1 Like