Correlation in Posterior for Slope Parameters (categorical)

vb690 · December 10, 2020, 9:08am

Hello everyone

I might have found an answer myself myself but I am looking for a sanity check.
I have the following simple linear regression:

y_{i} \sim N(\mu_{i}, \sigma)
\sigma \sim HalfCauchy(1)
\mu_{i} \sim \alpha + \beta_{ij}
\alpha \sim N(0, 1)
\beta_{j}\sim N(0, 1)

\beta is a categorical predictor with 4 levels
y has been standardized for having \mu=0 and \sigma=1
n=18,000

Issues I notice

Sampling is painfully slow (given the complexity of the model)
The posterior for the levels of the \beta parameter are almost perfectly correlated

I immgine point number 2 is the cause of point number 1, so looking at the traceplots

I noticed that the estimated parameters for Slope Clusters (\beta) overlap completely and are centered on 0. My suspect is that the predictor Cluster is 100% un-informative with respect to the outcome (this was under investigation so I couldn’t know it in advance), hence every level carries the same information and from here 2 and then 1 arise.

Am I correct in my intuition? Any extra word of wisdom on this?

Thank you

dilsher_dhillon · December 15, 2020, 7:46pm

Hi! -
You’re absolutely correct. Although I wouldn’t say they are correlated, rather, the posterior distributions of the groups overlap. Also, the coefficients beta_j aren’t slopes - they’re conditional means of the outcome. Or in this case, the deviation from the fixed intercept term for each level.

The model you described above is also referred to as a ANOVA. https://online.stat.psu.edu/stat502/lesson/1

What the plot indicates is that the outcome w.r.t to each level is similar. For instance, if the 4 levels are 4 different drugs and the outcome is reduction in blood pressure, if the posterior distribution of the 4 drugs looked like the above, there’d be very little evidence that any one of them reduces blood pressure more than the others.

I imagine the sampling is slow because of the number of observations.

ricardoV94 · December 15, 2020, 9:11pm

Have you tried removing the intercept (alpha)? It sounds like you might have an overparametrized model (5 de facto intercepts for 4 categories)

vb690 · December 15, 2020, 9:15pm

Hello @dilsher_dhillon

thank you very much for you answer, it is very reassuring!
Just a couple of addtional points:

Asserting that the posterior for the levels of a categorical predictor overlaps isn’t a synonynm of correlation?
I know categorical variables are indicized in PyMC3, the reason I call \beta_{j} a slope is that in my mind I see the linear model as a one-hot encode so \mu_{i} \sim \alpha + 1\beta_{ij} (also I am putting an effort to think at ANOVAs more in terms of linear models as it seems to make things easier )
I have ran more complicated models with more data in a fraction of the time of the linear model specified above. I know NUTS tend to slow down considerably if parameters’ posteriors are correlated.

My suspect was that since, as you said, the posterior for the groups overlap (i.e. they appear perfectly correlated to the sampler) NUTS was having an hard time doing its job.

vb690 · December 15, 2020, 9:25pm

Hi @ricardoV94

yes I used to do that before but it was ocassionally giving me problems. I know that when you run a linear model as the one above in statsmodel it automatically uses one of the categories as intercept.

From previous experience in PyMC3, if you build the model in context manager mode, that doesn’t happen (I might be disastrously wrong here) and over parametrizing seems to force one level of the categoty to become the intercept (the level parameter goes to zero and the intercept expresses the impact of that category).

cluhmann · December 16, 2020, 1:28am

Just to chime in, it’s often not possible to draw strong conclusions about the sampling or the posterior it generates without knowing more about a) how the model is implemented, b) the data you are using for inference, and c) the trace your sampling has produced. But I think folks have already laid out the relevant clues.

As @dilsher_dhillon mentioned, The similar-looking posteriors do not imply that the parameter estimates are correlated/dependent. Right now, you are plotting the marginal posteriors. To assess correlations/dependence, you would have to look at the joint posterior(s). So grab the trace object and crank out some scatter plots (e.g., plot_pair()).

That immediately leads into @ricardoV94 observation that the presence of the intercept is definitely going to cause some dependence (correlation) between the estimated value of \alpha and the estimated values of the \beta's. This always happens with regression-style models and is generally not that big a deal. The problem here, however, is that the “slopes” themselves are basically just intercepts. For example, \alpha=0, \beta_1=2, \beta_2=4, \beta_3=6, \beta_4=8 will give you exactly the same model as \alpha=5, \beta_1=-3, \beta_2=-1, \beta_3=1, \beta_4=3. And there are infinitely many such equivalent sets of parameter values.

So inspect your trace, looking for dependence between the the sampled values of \alpha and all the \beta's. Assuming you find strong dependence, omit \alpha and see how that goes. Or just drop \alpha and see how it goes (but then go check the original trace, because sampling diagnostics are important).

ricardoV94 · December 16, 2020, 8:02am

Hi again,

@cluhmann already described the issue of the overparametrization in detail.

I just wanted to clarify the confusion with the statsmodels. If you run your model in statsmodels indeed one of the categories is removed, and so you end with 1 intercept + 3 categories = 4 variables. It is perfectly fine to have a model with 0 intercept + 4 categories = 4 variables in PyMC3 (it would also be fine in statsmodels if that was an option there). But here you are actually going with 1 intercept + 4 categories = 5 variables. So you have more variables than you should, which leads to the issues that @cluhmann mentioned.

vb690 · December 16, 2020, 8:06am

Hi @cluhmann

I’ve been too hasty in my answer and I left out important details

When I say perfectly correlated I mean that is the result coming from plot_pair(my_trace)

12006×1322 298 KB
Here is the plate for the model, all the relevant information regarding the priors are specified in my original post

21013×613 44.2 KB
As you and @ricardoV94 said the intercept was causing the correlation problems!

4872×152 80.1 KB

51255×821 107 KB

I’ve been very silly.

Now I wonder, does this logic also apply to random intecepts?

ricardoV94 · December 16, 2020, 8:07am

This can happen if you have enough information (either prior or from the data) to identify the overparametrized model. But even if you can identify this more complicated model what is the benefit you get?

ricardoV94 · December 16, 2020, 8:11am

The posterior looks good. I have made the very same type of mistake before

Topic		Replies	Views
Random Intercepts and Slopes (Correlated) Convergence Issues Questions	16	2006	March 11, 2019
Correlated slopes in multivariate model Questions	9	3446	July 14, 2018
Cholesky decomposition and correlation among random effects Questions	20	5513	March 11, 2019
Bambi model correlated intercept slopes	2	69	April 10, 2025
Modeling two varying intercepts Questions	6	857	October 29, 2020

Correlation in Posterior for Slope Parameters (categorical)

Related topics