Understanding Hierarchical models

Hi team,
I was working on Hierarchical models in Bayesian using pymc3 and i had some questions regarding the approach.
1> Suppose i have 4 different groups in my data, and i am using normal distribution to create the local distributions, now to model the mean and standard deviation of local distribution i am creating global distributions or hyperpriors, now the question is in a particular advi iteration same value is sampled from the hyperpriors for all 4 local distributions or different values are sampled?
2> How do hyperpriors get updated, in the same way our local distributions are updated or any different way?
Thankyou

The hyperprior will take on just 1 value for a given iteration of ADVI (or any of the other posterior inference methods).

How do hyperpriors get updated, in the same way our local distributions are updated or any different way?

Hyperpriors get updated the same way as all other parameters. The only weird bit is that their influence on the likelihood is indirect. It has to propagate through the lower level priors. But that doesn’t change the basic mechanics - prior density which explains the data poorly will shift to regions the parameter space that explain the data well.

If you have four different hyperprior parameters, \mu_1, \ldots, \mu_4 and \sigma_1, \ldots, \sigma_4, then they will almost always get different values (the probability of them having the same values is technically zero outside of initialization). This is because the hyperpriors are treated the same way as other parameters in ADVI or MCMC.

Thanks bob.

Hi daniel, thankyou for your time, i do understand the updation part but i am not sure about the one value part as i have several other authors saying that different value of mean and standard deviations are picked from the global prior. I have plotted posteriors for both my local prior and global prior and their means are nowhere close.

The language around this is tricky. That’s why I like writing things out in math. If there are M groups of effects with N_m members each, and you have a prior like this

\qquad \alpha_{m, n} \sim \text{normal}(\mu_m, \sigma_m) for m < M, n < N_m,

and then hyper priors like this,

\qquad \mu_m \sim \textrm{normal}(0, 1) for m < M and

\qquad \sigma_m \sim \textrm{lognormal}(0, 1) for n < N_m,

then you will have \mu_m \neq \mu_{m'} if m \neq m'.

I’m pretty sure @daniel-saunders-phil meant that each of the \alpha_{m, n} for a fixed m and n < N_m have the same prior (same parameters \mu_m, \sigma_m), which is clear when you write out the math.

[edit: forgot subscripts on N]

Thanks bob, its much clear now. can you also provide some research paper or article which has more details regarding this.

Sure. Gelman and Hill’s multilevel regression textbook is really nice, but it’s in R. The relevant sections of Gelman et al.'s Bayesian Data Analysis are good and it’s available free online through the book’s home page. It’s much more mathy.

There are a lot of shorter tutorials around. Here are two from PyMC. One’s based on of Gelman’s papers:

and one’s based on McElreath’s book:

McElreath’s book Statistical Rethinking is a really great place to start. Although it was written for R and Stan, the lessons are all very high level and general and abstracted from impelemntation details. I think all the examples have been translated to PyMC.

3 Likes