Covariance of multiple varying coefficients in hierarchical models

In Gelman & Hill, chapter 13 introduces the idea of allowing more than one regression coefficient to vary by group. They allow these coefficients to be correlated. In section 13.3, they introduce the scaled inverse-Wishart to model the covariance matrix of the the coefficients. In section 13.4, under the heading ‘Understanding correlations between group-level intercepts and slopes,’ they provide an example of correlated slope and intercept coefficients that they almost completely solve just be centering the data.

With that context, I realized that I’ve seen many models which don’t bother modeling the covariance between group-level predictors:
The Varying Intercept/Varying Slope version of the Radon model from the pymc3 docs
The Hierarchical Model of the Premier League from my blog
The Hierarchical Model of Six Nations Rugby, from the pymc3 docs

I’m trying to wrap my mind around the implications of not modeling the covariance.

If I understand correctly, by not modeling the covariance, we’re in effect using an infinitely strong prior on the coefficients being independent. But in the soccer and rugby examples, I’d actually expect the attacking and defending strengths to be strongly correlated, so I’m not properly encoding all my prior knowledge into the model. This could come back to bite me, if, say, it was early in the season and based on results so far, there is a team that appears to be very strong in attack but very weak on defense; the model wouldn’t be able to rely on a) my prior knowledge that this is unlikely and b) any evidence provided by the other teams in the league that attacking and defending strength usually go together.

Am I thinking about this correctly? To make it more concrete - what is the worst thing that could happen from _not_modeling the covariance of these coefficients?


Hi @DanWeitzenfeld, I had a question similar to yours that we discussed in this thread.

As you did, I noticed that some hierarchical models make use of the covariance matrix (especially the ones implemented in STAN), while some others do not. I was wondering whether you found a clear explanation on this modelling technique. From my experiments, it seems that it does not affect much the fitting. To be fair though, the dataset I used has a correlation between group level predictors of about 0.06 (i.e. very low).

1 Like

Hi @Jack_Caster,

The clearest discussion of this that I’ve found is chapter 13 of Gelman and Hill.

Since first posting this, I’ve gained confidence that I’m understanding it correctly, i.e. that

by not modeling the covariance, we’re in effect using an infinitely strong prior on the coefficients being independent

… and that if you’re using a model for prediction, you may be losing predictive power by not modeling the covariance (e.g. the attacking/defending strengths correlation example in my post above).

It may be ok for your problem if it you have no prior reason to believe that the group-level coefficients are correlated.

1 Like

@Jack_Caster Someone just asked a similar question on the Stan discourse: