Notation for hierarchical models

I am having trouble understanding how to correctly build hierarchical models in PyMC3. Specifically, in a hierarchical model, when we indicate that a random variable’s parameter value comes from another random variable (the hyperparameter), how does PyMC3 “know” when I want this to be a new sample from the hyperparameter, and when it is shared? I.e., let’s say I have three populations, X, Y, Z, each normally distributed, and their mean is distributed as some other random variable, H. How do I differentiate between these two generative models:

  1. Each population has its own mean, so there are three i.i.d. samples from H.
  2. X, Y, and Z all have the same mean, which is unknown, and is H = h.

I was looking at the hierarchical model for Rugby page which has this snippet of code:

atts_star = pm.Normal("atts_star", mu=0, sd=sd_att, shape=num_teams)

I believe that this means that each team has its own standard deviation, but I don’t know if that is the case. Would it be possible for someone to supply two snippets of code equivalent to this, one in which the teams all have the same (unknown) standard deviation, and one in which they each have their own (unknown) standard deviations?
Thanks!

You don’t need H then.

X = Normal (“X”, mu = mu_x, sd=sd_x)
Y = Normal (“Y”, mu = mu_y, sd=sd_y)
Z = Normal (“Z”, mu = mu_z, sd=sd_z)
#the following part can be anything. the idea is that you need a likelihood with observations assigned to it
#assuming x, y and z are coefficient of a linear regression model
mu = Xx1 + Yx2 + Z*x3
likelihood = Normal(“y”, mu = mu, sd = sd_y, observed = y)

Here mu_x, sd_x etc can be any value. The does not have to be hyper-priors (random variables - RVs). However, if you want a hierarchical model with hyper-priors, then you can assign mu_x, mu_y, mu_z and even sd_* (any sd parameter) to a random variable. Notice that if you want X, Y and Z to have random variables as their own mean (without any correlations between each other means) then each mean should have its own random variable.

mu_x = Normal(“mu_x”, mu = 0, sd = 10)
mu_y = Normal(“mu_y”, mu = 0, sd = 10)
mu_z = Normal(“mu_z”, mu = 0, sd = 10)

And it should be noted that mu = 0 does not means that even after the inference (sampling) the mean of above RVs will be 0. Above RVs are just the priors. During the sampling they will be changed according to the observations that we have in likelihood. During the inference the each free RVs (if you don’t assign observations then those RVs are free RVs and only they are changed during the inferences) will be sampled using the relationships between each random variable. (try plotting the traceplot to understand this)

I think now you can think the answer for this question. However, for you convenience it should be something similar to below.

H = Normal (“H”, mu = 0, sd = 10)
X = Normal (“X”, mu = H, sd=sd_x)
Y = Normal (“Y”, mu = H, sd=sd_y)
Z = Normal (“Z”, mu = H, sd=sd_z)

Notice that all the X, Y and Z has same mu = H.

No, this means that all the teams have same Random Variable as their standard deviation. Therefore, all the teams have same hyper-prior for their std dev. That is to represent that the each atts_star of each team are correlated. Check this tutorial then you will understand why such single hyper-prior is important for hierarchical models. However, notice that you can’t assign same hyper-prior to mu here that would imply that the mean of all the atts_star of each team will behave same.

Already answered using the means. However, if you wants to write everything combined together in few lines,

sd_att = pm.HalfStudentT('sd_att', nu=3, sd=2.5, shape = num_teams)

atts_star = pm.Normal("atts_star", mu=0, sd=sd_att, shape=num_teams)

Another small tip is that, when you are unsure about the patient of the random variable (i.e., whether they are sharing the same prior), you can check the shape of the parameter. For example, in

atts_star = pm.Normal("atts_star", mu=0, sd=sd_att, shape=num_teams)

You can do sd_att.tag.test_value to print the default value of the variable. If it has the same shape as (num_teams), it means in atts_star each get their own prior for sd.

Thank you very much for the explanation. I think I understand. Let me state my current understanding below:

  1. All mentions of a previously existing variable in PyMC3 will indicate a single random variable, rather than a set of variables with the same distribution.
  2. The exception to this is the case of observed parameters for random variables – these indicate observations, each of which is an IID sample from the same random variable.

I think you can see from the above two why PyMC3 can be confusing: the way “observed” works seems to change the semantics of the way random variables are interpreted. So I could rephrase the above duality as:

Whenever you refer to a hidden (unobserved) random variable, you are referring to a unique such variable. All simultaneous references to that variable, for sampling purposes, will see the same value. However, whenever you refer to an observation, you are referring to single IID sample from a random variable.

So “hidden” random variables co-refer, but “visible” random variables do not co-refer.

As a notation, this seems quite confusing to me, and I’m still not entirely sure I get the underlying principle. Hyperparameters are used to set the parameters of random variables, but then those random variables can be sampled multiple times. The hyperparameters, however, cannot be – conceptually a population of IID samples from a random variable with parameters corresponds to a single sample from the hyperparameters.

So if I want to say there are multiple sub-populations, with unknown means and variance, from what is conceptually a single population, I would have to have hyper-hyper-parameters for the single population, which would be used to generate hyper-parameters for each sub-population, which would generate population distributions, which would then be sampled (i.e., which would have a list of values as their observed).

Just to see if I understand all the subtleties, let’s imagine that I have a model like the above: we have a H’’ set of hyper-hyper-parameters that generates H’1 H’2 and H’3, hyper-parameters for sub-populations with unknown means and variance. H’1 H’2 and H’3 in turn generate X, Y, and Z which provide the distribution for three different sub-populations. The observations are samples of X, Y, and Z.

Let’s imagine I have evidence about one of the hyperparameters. E.g., I have an expert in the field who has strong intuitions about the mean and variance of sub-population X. How could that be encoded? As I think about this as a causal (Bayes) net, I might associate virtual evidence to H’1. Is that possible with PyMC3? And if so, how would it be done? Note that I don’t need to do this, so “it can’t be done” is fine with me. But I think it’s a neat example for showing what seem like odd semantics for the notion of random variable and observation in PyMC3, considered as a programming language.

Thanks for all of your help!

I think the confusion sometime arise from the fact that PyMC3 currently does not differentia different shape of a Random variable - sample, batch shape, and event shape (reference see https://arxiv.org/pdf/1711.10604.pdf). There are a bit more thinking in this WIP PR: WIP Refactor Distribution and random variables by aseyboldt · Pull Request #2833 · pymc-devs/pymc · GitHub. But basically, it is not completely correct that:

As one named random variable can have different batch shape, which makes it a set of variables instead of a single one.

You design more informative prior for X, namely H’1

Thanks again for the response. The shape issue helps me understand a little better. Presumably if I have a variable X, of shape k, and it gets its parameter from another variable, H, also of shape k, PyMC3 will know that each X gets a variable from H.

As for the evidence about the hyperparameters, Pearl suggests that expert opinion (especially for multiple experts) can be incorporated into a model by means of “virtual evidence,” which is what led me to the question.

As always, thanks for the help,
r