How do I limit the diagonal of the covariance matrix to a specific parameter?

\mathbf\Sigma= \left(\begin{array}{l} \sigma_{1}^2 & \cdots & \sigma_{1}\sigma_{ t}\\ \vdots & \ddots &\vdots\\ \sigma_{1}\sigma_{ t}& \cdots & \sigma_{ t}^2 \end{array}\right)\\ \,\\ where \quad t = 1,...,T \\ constrain \, \sigma_1^2,...,\sigma_t^2=1

LJK distribution is usually used as a prior distribution of covariance. Just like below. But if I want to set some value in the covariance matrix, for example, to set all of the variances to 1 (diag=1) , how do I do that?

with pm.Model() as m:
    chol, corr, stds = pm.LKJCholeskyCov(
        "chol", n=5, eta=2.0, sd_dist=pm.Exponential.dist(1.0), compute_corr=True
        )
    pm.Deterministic("cov",chol.dot(chol.T))
    mean = pm.Normal("mean",0,10,shape=(5))
    pm.MvNormal("y",mean,chol=chol,shape=(1000,5))

Can you use tt.fill_diagonal()?

1 Like

If you want to constrain the covariance matrix like this, you’re going to have to deal with the fact that not every resulting covariance matrix will be positive definite, so you will need to define your priors in a very specific way.

To illustrate, think about decomposing the covariance matrix as described in footnote 16 of the Stan manual - as \Sigma = \text{Diag}(\sigma) \times \Omega \times \text{Diag}(\sigma), where \sigma is a k \times 1 vector of variances and \Omega is a k \times k matrix of correlation coefficients. In principal, the diagonal of \Omega should be all ones, but we can think about a degenerate family of \Omega where we set the diagonal such that the diagonal of \Sigma is all ones.

Take the case where k=2, and define \Omega = \begin{bmatrix} a & b \\ c & d \end{bmatrix} and \sigma = [\sigma_1, \sigma_2]^T. This gives:

\Sigma = \begin{bmatrix}\sigma^2_1 a & \sigma_1\sigma_2 b \\ \sigma_1 \sigma_2 b & \sigma^2_2 d \end{bmatrix}

Since we want to restrict the diagonal of \Sigma to 1, define a = \frac{1}{\sigma_1^2} and d = \frac{1}{ \sigma^2_2}, then we can set b = c = 1.

In PyMC it might look like this:

with pm.Model() as m:    
    sigma = pm.Uniform('sigma', lower=1e-6, upper=1 - 1e-6, shape=2)
    sigma_diag = tt.diag(sigma)
        
    omega = tt.as_tensor(np.full((2, 2), 1.0, dtype='float32'))
    omega = tt.set_subtensor(omega[np.diag_indices(2)], sigma ** (-2))    
    cov = pm.Deterministic('cov', tt.dot(tt.dot(sigma_diag, omega), sigma_diag))

    mean = pm.Normal("mean", 0, 10, shape=2)
    pm.MvNormal("y" ,mu=mean, cov=cov, shape=(1000,n), observed=data)

As I mentioned, however, this family of covariance matrices is degenerate, and will not always be positive definite. To see this, we solve for eigenvalues \lambda using this characteristic polynomial:

\lambda^2 - (\sigma_1^2 a + \sigma^2_2 d) \lambda - (cb - ad)\sigma^2_1 \sigma^2_2 = 0

If you plug this into the quadratic equation, you will find that the positive branch is always positive, but the negative branch is only positive when cb > ad. We require both branches – that is, both eigenvalues – to be positive in order for this to be a valid covariance matrix. Using the values for a, b, c, d found above, this amounts to:

\sigma_1 \sigma_2 < 1

Obviously this is only for the k=2 case, but in general you are always going to have a system of k-1 inequalities that you need to respect in order to get a valid covaraince matrix. If all the values of \sigma are less than 1 you should be ok (at least it seems from some quick numeric tests, I can’t prove this). If any of them go above 1, you will have to check that the system of inequalities holds.

So you could implement all this, and place priors on your \sigma variables to ensure they stay in a “good” region, but at some point you have to stop and ask yourself “why am I doing this? Is it really necessary?”

Edit:

I was thinking about this more and you can interpret the off-diagonal elements of a restricted covariance matrix like the one you proposed as correlation coefficients, so I went through all that to show that correlations are between 0 and 1. Not exactly covering myself in glory here.

Here’s my shot at redemption:

k = 4

true_cov = np.random.uniform(0, 0.3, size=(k, k))
true_cov= true_cov @ true_cov.T
true_cov[np.diag_indices(k)] = 1.0

mu = np.ones(k)
data = stats.multivariate_normal(mean=mu, cov=true_cov).rvs(1000)

with pm.Model() as m:    
    bounded_halfnorm = pm.Bound(pm.HalfNormal, 0, 1)
    
   # need sum(1 + 2 + ... + (k-1)) correlation coefficients
    rhos = bounded_halfnorm('rhos', sigma=0.1, shape=(k * (k + 1) // 2) - k)
        
    cov = tt.eye(k)
    cov = tt.set_subtensor(cov[np.tril_indices(k, -1)], rhos )
    
    # using np.triu_indices doesn't make cov symmetric?
    cov = tt.set_subtensor(cov[np.tril_indices(k, -1)[::-1]], rhos)
    cov = pm.Deterministic('cov', cov)

    mean = pm.Normal("mean", np.zeros(k), np.ones(k), shape=k)
    y_hat = pm.MvNormal("y" ,mu=mean, cov=cov, shape=(1000, k), observed=data)
    
    prior = pm.sample_prior_predictive(var_names=['cov', 'y'])
1 Like

But I need information about the covariance, and make sure that the matrix is positive definite.

The formula to convert a correlation matrix to a covariance matrix is \Sigma = \text{Diag}(\sigma) \Omega \text{Diag}(\sigma) where \sigma is the vector of standard deviations. Since you believe all the \sigma^2 = \sigma = 1, this becomes \Sigma =I \Omega I = \Omega. If you restrict the diagonal to be one and require the matrix to be positive definite, the covariance matrix and correlation matrix are the same object.

1 Like

I first generated the correlation matrix \Omega with LKJ prior, and then converted it into a positive definite covariance matrix through Diag(\sigma) \Omega Diag(\sigma). Is this the right way?

triu_idx = np.array(np.array(np.triu_indices(5)))
# upper triangular matrix row and col index without diag,shape=(2,-1)
idx_rm_diag = triu_idx[:,(triu_idx[0]!=triu_idx[1])]

with pm.Model() as m:
    corr_array = pm.LKJCorr("corr", n=5, eta=2.0)
    # transform vector to matrix
    # upper triangular matrix without diag
    cor_tri = tt.set_subtensor(tt.zeros((5,5))[idx_rm_diag[0],idx_rm_diag[1]],corr_array)
    
    # corr matrix
    corr_mat = cor_tri + tt.eye(5) + cor_tri 
    sigma = tt.ones(5) # Or other standard deviation specified
    sigma_mat = tt.diag(sigma)
    cov = pm.Deterministic("cov",(sigma_mat).dot(corr_mat).dot(sigma_mat))
    mean = pm.Normal("mean",0,10,shape=(5))
    pm.MvNormal("y",mean,cov=cov,shape=(1000,5))

Yeah looks good. It looks like you’re missing a transpose on the line corr_mat = cor_tri + tt.eye(5) + cor_tri.T, though.

I made a couple simplifications, but most importantly, np.triu_indices and np.tril_indices has a parameter k to specify a diagonal offset, so you don’t need to eliminate the main diagonal yourself. After that, you also don’t need to worry about packing and unpacking it, you can directly use the return from the function.

n = 4
triu_idx = np.triu_indices(n=n, k=1)

with pm.Model() as m:
    corr_array = pm.LKJCorr("corr", n=n, eta=1.0)
    corr_mat = tt.set_subtensor(tt.zeros((n, n))[triu_idx], corr_array)
    corr_mat += corr_mat.T + tt.eye(n)
    
    sigma_diag = tt.eye(n) * 1.0 # Or other standard deviation specified
    cov = pm.Deterministic("cov", (sigma_diag.dot(corr_mat)).dot(sigma_diag))
    
    mean = pm.Normal("mean",0, 10, shape=(n))
    
    y_hat = pm.MvNormal("y",
                        mu=mean,
                        cov=cov,
                        shape=(N_samples,n), 
                        observed=data)

I sampled it against some artificial data and it seemed to work fine.

4 Likes

Yeah, I forgot to transpose it, Thank you for correcting me.
And the numpy function make code efficient~ :smile:

1 Like