Hierarchical model varying indexing of distributions

Hello,

I am trying to make a hierarchical model of my data as there is some structure. I run into a problem of how to code it. There are measurements done at three locations and on different number years. Like so: Location1: 3 years, Location2: 8 years, Location3:2 years, I have no reason to think say that location1 year 2020 has same mean as location3 year 2020. I have more variables, where it would be more complicated to write it out 50 x 400 :frowning:
If I do this:

beta_location_year = dict()
for location in data["location"].unique():
    tmp_year = data[data["location"]==location]["year"]
    beta_location = pm.Normal("beta_"+ location,mu=0,sigma=1)
    for year in tmp_year.unique():
        name = location + "_" + str(year)
        beta_location_year[name] = pm.Normal(name,mu=beta_location,sigma=1)

I get error complaining about loop fusion.

I could write it as:

beta_location = pm.Normal(“beta_location1”,mu=0,sigma=1,shape=3)
beta_location1_year = pm.Normal(“beta_location1”,mu=beta_location[0],sigma=1,shape=3)
beta_location2_year = pm.Normal(“beta_location2”,mu=beta_location[1],sigma=1,shape=8)
beta_location3_year = pm.Normal(“beta_location3”,mu=beta_location[2],sigma=1,shape=2)

But this is not feasible with other variables in the data. Is there a better way of doing it?

I have looked at Loop fusion failed - #8 by jessegrabowski , Notebook 4: Hierarchical Models — Bayesian Inference with PyMC and a few others, but I failed to find any there indexing was used.

Your problem seem almost the same as the issue you linked, just that you will do the indexing on the betas, not the intercept. So you will just make one vector of betas for all the times, and one vector of betas for the locations, then use fancy indexing to expand them to df.shape[0] add them together, like y = (beta_location[loc_idx] + beta_year[year_idx]) * X. Have you tried something like that?

I have tryed it, but I would like to have a hierarchy. I would like to see if the location of measurement has an effect on my observable. Each location has recordings on different years, so I am thinking that these have a mean of some distribution, say a normal one, based on location. But it is adjusted by some value based on year it was measured (different people worked, equipment changes). Say year 2018 at one location would not have the same effect as the same year 2018 at the other location, which I think that your suggestion would mean. I am new to this type of modelling so I apologise if I missunderstand how things work.

What I accidentaly learned is one can have things like:

beta_location = pm.Normal(“locations”, mu=0,sigma=1,shape=3)
beta_location_year = pm.Normal(“Location_years”, mu = beta_location, sigma=1,shape=(len(number_unique_years),3)

I always tried having shape argument as shape=(3,10), but that works is shape=(10,3) I just couldn’t figurre out how to name it so I could ask google or people about it.

Maybe this shines some light? Distribution Dimensionality — PyMC 5.8.0 documentation

1 Like

I think it’s also important to realize this equivalence:

rng = np.random.default_rng(1234)
beta_location_1 = rng.normal()
beta_location_year_1 = rng.normal(loc=beta_location_1, size=n_years)

rng = np.random.default_rng(1234)
beta_location_2 = rng.normal()
beta_year = rng.normal(size=n_years)
beta_location_year_2 = beta_location_2 + beta_year

np.allclose(beta_location_year_1, beta_location_year_2)
>>> Out: True

A model with additive parameters is a hierarchy, just written in a different way. You can compare how I wrote beta_location_year_2 with a non-centered parameterization to see that the years will be clustered at the mean determined by the location, exactly as you describe (just imagine there is a sigma_year hanging out in front of beta_year).

1 Like

Very small nitpick: Those are statistically the same but need not be numerically the same (allclose). It just happens that location/scale doesn’t influence numpy RNG, but they could in other algorithms/ loc-scale families (e.g., when the number of random bits consumed depends on the parameter values).

1 Like

I actually tried to do the example with a batch dimension and it didn’t work, I imagine because of exactly the issue you raise.

I really wanted something to clearly demonstrate the point though, because to this seems to be the important thing to grok: given priors from the loc-scale family, a nested hierarchical model can be pulled apart into a series of sums and offsets. This, combined with fancy indexing, makes it easy to work with ragged data structures.

Thank you the rng example. I tried your example, and thought about it, it makes sense.

But this got me confused. How does model/interpretation work if I have more than these two variables added? I would greatly appreciate if you could point me to a resource where I could learn more about this.