Why do you expect the correlation of b0 to be lkjcc_corr?
Here you have one “group” only, but as I said, lkjcc is a hyperprior and the MvNormal with covariance lkjcc is a prior.
If instead of using MvNormal as the prior for b0 you used a Normal one, would you expect them to be uncorrelated?
If instead of using lkjcc as covariance of the MvNormal you used a fixed value, would you expect the correlation of the posterior samples in b0 to be that fixed value?
If instead of having a single “group” you used the lkjcc with a hierarchical model, would you expect the correlations of all levels to match lkjcc? This is actually what happens in A Primer on Bayesian Methods for Multilevel Modeling — PyMC example gallery, the correlations in posterior samples between a and b in ab_county for each county go between -0.6 and -0.1.
Your case is similar to the model in the radon notebook that uses lkjcholeskycov, but you only have one “group” in your case. I generally struggle trying to interpret and intuitively work with hyperpriors, but I don’t see why does lkjcc_corr need to match the correlations between posterior samples. It might seem that the fact there is only one group means the hyperprior will learn the correlations in b0, but:
- does it really have enough information to do so? Has it really learned the correlations but it stores them in the wrong position? [1]
- Does it get stuck on some kind of local minima? After all, lkjcc is “just a prior”, the vast majority of models only use univariate priors yet the vast majority of posteriors have correlations between the posterior variables.
- Does the model show any sign of not having converged? There are many cases when we can know it hasn’t converged but never really get any convergence guarantee.
[1]. Instead of comparing the pdfs of each of the non-diagonal positions in lkjcc_corr to the mean correlation between variables in b0, you can use subsampling bootstrap (if you have 1000 samples, take the first 100 and compute the correlation, then the 1-101, 2-102… so you get 900 values of the posterior correlation) to estimate the pdf of the correlations in b0 and see if the distributions do indeed look the same but in the wrong matrix position or if they don’t.