No, I mean that if you use a hierarchical model, by definition you include both varying and fixed effects. That’s why they are also called pooling models – there always is this dance between both types of effects that results in shrinkage of cluster parameters (aka varying effects) towards the population mean (aka fixed effects).
These models contrast with complete-pooling models (only fixed effects, so you risk total underfitting if your clusters vary a lot) and no-pooling models (only random effects, so you risk total overfitting if some clusters have a small sample size).
So that’s why the tutorial you linked to actually has both varying and fixed effects:
# Priors: these are the fixed effects, or the baseline effects if you will
mu_a = Normal('mu_a', mu=0., sigma=1e5)
sigma_a = HalfCauchy('sigma_a', 5)
mu_b = Normal('mu_b', mu=0., sigma=1e5)
sigma_b = HalfCauchy('sigma_b', 5)
# Random intercepts: these are deviations from the mean fixed intercept mu_a.
# The more different the counties, the bigger sigma_a.
# The parameters of the population of intercepts (mu_a and sigma_a) are learned
# from the data at the same time as each county's individual intercept (a).
a = Normal('a', mu=mu_a, sigma=sigma_a, shape=counties)
# Random slopes: same remarks
b = Normal('b', mu=mu_b, sigma=sigma_b, shape=counties)
If you want a random-effects-only model, that would be:
# Random effects, whithout the hierarchical structure:
a = Normal('a', mu=0, sigma=10, shape=counties)
b = Normal('b', mu=0 sigma=1.5, shape=counties)
Most of the time, these two models won’t give the same results – the hierarchical one would likely be more accurate on average – except if your clusters are very different from one another, in which case pooling of information is useless because you actually don’t learn anything about cluster 2 once you learned about cluster 1.
I think Jack’s model is more complicated than what you need because he’s also estimating the covariation between slopes and intercepts. This usually yields a much more complicated geometry for the sampler to explore, so you have to use algebra tricks to help sampling – I think it’s too advanced for your use case.
Modeling interactions actually just comes down to adding a term to the regression that models, well, the interaction between two (or more) predictors. Here I think I’d do:
mu_b3 = Normal('mu_b', mu=0., sigma=1e5)
sigma_b3 = HalfCauchy('sigma_b', 5)
b3 = Normal('b3', mu=mu_b3, sigma=sigma_b3, shape=subjs)
y_hat = a[subj] + b1[subj] * blockType + b2[subj] * CSType + b3[subj] * blockType * CSType
Again, with the caveat that these priors are not best-practice (I think I’ll do a PR to update this NB actually).
Hope this helps