Why or why not to sample `global_param_sigma` in hierarchical models

mattiasthalen · October 17, 2022, 5:54pm

From what I understand, this setup is quite typical for a hierarchical model:

with pm.Model() as model:
    global_param_mu = pm.Normal(‘global_param_mu, 0.0, 1.0)
    global_param_sigma = pm.HalfNormal(‘global_param_sigma’, 1.0)

    local_param = pm.Normal(‘local_param’, global_param_mu, global_param_sigma)

    mu = pm.Deterministic(‘mu’, x*local_param[local_idx])
    sigma = pm.HalfNormal(‘sigma’, 1.0)

    likelihood = pm.Normal(‘likelihood’, mu, sigma, observed = y)

And I’m wondering about global_param_sigma… I discovered that when I drop it and just enter 1.0 as the value for local_param sigma, my model samples much faster and give the same good result.

Since my data is standardized, I expect everything to be distributed normally with 0.0 mean and 1.0 stdev. So I thought this would be ok.

But still, I would like to here someone else’s thoughts of why or why not to sample global_param_sigma

opherdonchin · October 19, 2022, 8:14am

I’m not expert, but:

There’s an issue with your model in that sizes or dimensions are not specified. local_param seems to be indexed by local_idx, but it’s not at all clear how many of them there are.

If your data is standardized, that means that the overall standard deviation (of the observed data y) should be 1.

This standard deviation is calculated on the data and the model says it arises from three different sources: the variance of y around mu (here called sigma), the variability in mu caused by x, and the vaiability in the modulation of the effect of x that comes from variability in local_param (called global_param_sigma).

Together these different sources of variability in y have to sum to be 1, but since variability is positive, they cannot each themselves be 1. In fact, what you are doing by having the two parameters, sigma and global_param_sigma is partitioning the variance of y into three variances as described above. Properly normalized, they will sum to the total variance which is 1.

Note also, that they will not sum to exactly 1. It is only the variance of the standardized data that is 1. The variance of the population is a parameter with a distribution. This distribution can be found by addng up the (properly normalized) (samples of the) components of the variance specified in the model as I discussed above.

I hope this helps. I worry I was not clear.
Opher

Topic		Replies	Views
Advice with hierarchical model Questions	9	2202	October 8, 2018
Reparameterization of 3-level hierarchical bayesian model Questions	9	2279	July 12, 2022
Generating population level samples for heirarchical parameters Questions	5	483	January 22, 2020
Notation for hierarchical models Questions	5	1008	April 20, 2018
Divergences with unused variables	11	562	November 16, 2023

Why or why not to sample `global_param_sigma` in hierarchical models

Related topics