Understanding Hierarchical models

The hyperprior will take on just 1 value for a given iteration of ADVI (or any of the other posterior inference methods).

How do hyperpriors get updated, in the same way our local distributions are updated or any different way?

Hyperpriors get updated the same way as all other parameters. The only weird bit is that their influence on the likelihood is indirect. It has to propagate through the lower level priors. But that doesn’t change the basic mechanics - prior density which explains the data poorly will shift to regions the parameter space that explain the data well.