Non-centered multivariate normal model with common variance

Hi all,

I am trying to build a Multivariate Normal model where the two covarying parameters are instances of the same population and need to have the same associated variance. I could not find any other discussion dealing with this, apologies if I missed it.

I first thought that this could be achieved by simply setting to 1 the shape of the distribution for the standard deviations in the distribution of the LKJ correlations like this:

import pymc as pm
import aesara.tensor as at     #using an older pymc version

sd_dist = pm.Exponential.dist(1, shape=(1, ))
chol, corr, stds = pm.LKJCholeskyCov(
       "chol_cov",
        eta=2,
        n=2,
        sd_dist=sd_dist
)

But this does not seem to force the model to recognize that the distribution of the variances is a single one? Although I might be not interpreting this correctly.
Alternatively I guess that one needs to build the covariance matrix “manually” from the correlation matrix using something like this?

chol, corr, stds= pm.LKJCholeskyCov(
        "chol_cov",
        eta=2,
        n=2,
        sd_dist=pm.Exponential.dist(1)
)
sigma = pm.Exponential("sigma", 1)
cov = pm.Deterministic("cov", var=(
        pm.math.dot(
            at.diag(at.stack([sigma, sigma])),
            pm.math.dot(
                corr,
                at.diag(at.stack([sigma, sigma]))
            )
        )
)

Could the covariance in this form be directly multiplied by the z-offset in a non-centred model?

Thanks!

1 Like

If you want to restrict the variance to a single value, you could draw the correlation matrix from an LKJCorr and then build a covariance matrix yourself. It would look something like this:

n_units = 5
N = 100

with pm.Model(coords={'obs_idx':np.arange(N), 'unit':np.arange(n_units)}):
    L_corr_packed = pm.LKJCorr('correlation_matrix', eta=1, n=n_units)
    
    # k=1 to offset the upper-triangular indices by 1 to the right. 
    # Do this because L_corr_packed is the upper triangle excluding the main diagonal of 1's
    triu_idx = np.triu_indices(n_units, k=1)
    L_corr = pt.set_subtensor(pt.eye(n_units)[triu_idx], L_corr_packed)
    
    # Build the correlation matrix
    corr = L_corr @ L_corr.T

    # Spherical variance
    sigma = pm.HalfNormal('sigma', sigma=10)
    sigma_diag = pt.eye(5) * sigma

    # Build the covariance matrix
    cov = sigma_diag @ corr @ sigma_diag
    
    # From here we can do the non-centered draws
    L = pt.linalg.cholesky(cov)
    mu_unit = pm.Normal('mu_unit')
    x_offset = pm.Normal('x_offset', dims=['unit', 'obs_idx'])
    x = pm.Deterministic('x', (mu_unit + L @ x_offset).T, dims=['obs_idx', 'unit'])
    
    idata = pm.sample_prior_predictive()

Note that after doing this you won’t end up with equal variances across the units, since their (initial equal) variances will be mixed together via the correlations. There’s probably some clever adjustment you could do to ensure this doesn’t happen, but I don’t know it.

1 Like

Hi @jessegrabowski,

Thanks so much for this, it is super helpful both for what I was trying to do and to learn more tools in the pytensor package! (I need to learn more about what is possible with it, any resource you could suggest?).

Could you expand on your comment at the end of your post? What do you mean by “you won’t end up with equal variances across the units”? Given that they are modelled by a single distribution, I’m expecting that I can only have a single common posterior for sigma? And indeed this is what I see. I think I am not interpreting your comment correctly though.

Thanks again,
Paolo

At the end I just meant that np.all(np.diagonal(cov) == sigma) is False. So if you expect that, after doing all these operations, that the variances of each RV will be identical, they won’t.

For pytensor stuff the tutorial docs are quite nice, it’s written in a semi-textbook format with examples and “assignments” at the end of each section. I don’t think they’ve been updated since the Theano days, but I was just looking through it and it seemed right. If you run into anything that doesn’t work as expected just post about it here :smiley:

Ah yes, got it, interesting. I don’t think I fully understand how this comes about.
Because the off-diagonal components of cov, which mathematically should just be \rho \sigma^{2}, are identical. If there is only one sigma distribution to sample from, what does it mean that the variances mix together?
I noticed that the posteriori of the correlation matrix is also “weird”. While one of the diagonal elements is identically 1 as expected, the other learns a new distribution.
I wonder if something is up with how L_corr is constructed. Naively, in 2 dimensions, if I think of L_corr as:

\left(\begin{array}{cc} 1 & a\\ 0 & 1 \end{array}\right)

Then the correlation matrix as defined would give us:

\left(\begin{array}{cc} 1 & a\\ 0 & 1 \end{array}\right) \left(\begin{array}{cc} 1 & 0\\ a & 1 \end{array}\right) = \left(\begin{array}{cc} a^{2} & a\\ a & 1 \end{array}\right)

which explains what I am seeing for the components of the correlation matrix and it doesn’t seem to make sense but I don’t know enough about Cholesky factors of correlation matrices.
An alternative could be doing this maybe (which is honestly just a way to get around understanding what is off with the construction of the corr matrix)?

chol, corr, stds= pm.LKJCholeskyCov(
        "chol_cov",
        eta=2,
        n=2,
        sd_dist=pm.Exponential.dist(1)
)
sigma = pm.HalfNormal('sigma', sigma=10)
sigma_diag = pt.eye(n_units) * sigma
cov = pm.Deterministic(
        "cov",
        var=sigma_diag @ corr @ sigma_diag
)
    
L = at.linalg.cholesky(cov)
mu_unit = pm.Normal('mu_unit')
x_offset = pm.Normal("z_T", 0, 1, shape=(n_units, N_dyads))
x = pm.Deterministic('x', (mu_unit + L @ x_offset).T, dims=['obs_idx', 'unit'])

Taking advantage of the already constructed corr matrix, but I don’t know if this can cause other issues (the LKJ is still learning something about the “dummy” covariances although they are almost identical). With this approach the diagonal elements of the corr matrix are now as expected, however the sampling is quite a bit slower and I get 2 divergences. Estimates for the relevant parameters though are compatible between the two models.

Thanks for the tip on the pytensor doc, I will check it out!

Yes I think what is returned is not the cholesky decomposed correlation matrix (that’s what I was assuming with the 'L_corr @ L_corr.T` but instead just the upper triangle itself. So the correct way to construct the correlation matrix is:

    corr_packed = pm.LKJCorr('correlation_matrix', eta=1, n=n_units)
    
    # k=1 to offset the upper-triangular indices by 1 to the right. 
    # Do this because L_corr_packed is the upper triangle excluding the main diagonal of 1's
    triu_idx = np.triu_indices(n_units, k=1)
    corr_upper = pt.set_subtensor(pt.zeros((n_units, n_units))[triu_idx], corr_packed)

    corr = pt.eye(n_units) + corr_upper + corr_upper.T

Then the covariance matrix will be correct (with identical values on the diagonal). Sorry about that!

1 Like

Ah definitely don’t worry, you are being super helpful!
Gotcha, yes this makes sense if LKJCorr returns already the correlation matrix components rather than the Cholesky factors.

Works like a charm and it’s blazing fast!
Thanks again A TON!

1 Like