# Covariance matrix inference

Dear all, I would like to kindly ask you for your support. I have tried to perform the inference with covariance matrix which is not hardcoded as matrix for n variables as the input and I failed. I am using the PyMC v5.3.0. Honestly, I have no idea how to solve it. Please find the code bellow which I used:

``````n = 3 # number of variables
data = np.random.randn(5, n) # example data with 5 observations and n variables

with pm.Model() as pearson_model_n:

# Priors for the mean and standard deviation of each variable are weakly informative
mean = pm.Normal('mean', mu=0, sigma=10, shape=n)
sigma = pm.HalfNormal('sigma', sigma=10, shape=n)

# Prior for the correlation matrix
packed_L = pm.LKJCholeskyCov('packed_L', n=n, eta=2., sd_dist=pm.Exponential.dist(1.))
L = pm.expand_packed_triangular(n, packed_L)
cov = pm.Deterministic('cov', L.dot(L.T))

# Calculate the Pearson correlation matrix
corr = pm.Deterministic('corr', cov / np.outer(sigma, sigma))

# Define the multivariate normal distribution
y_pred_n = pm.MvNormal('y_pred', mu=mean, cov=cov, observed=data)

# Perform Bayesian inference
trace_pn = pm.sample(draws=2000, chains=4, tune=1000, random_seed=42)
``````

Here is the error I recieved:

``````AttributeError                            Traceback (most recent call last)
Cell In[1011], line 11
9 # Prior for the correlation matrix
10 packed_L = pm.LKJCholeskyCov('packed_L', n=n, eta=2., sd_dist=pm.Exponential.dist(1.))
---> 11 L = pm.expand_packed_triangular(n, packed_L)
12 cov = pm.Deterministic('cov', L.dot(L.T))
14 # Calculate the Pearson correlation matrix

File c:\Users\uknown\anaconda3\envs\pymc202305\Lib\site-packages\pymc\math.py:432, in expand_packed_triangular(n, packed, lower, diagonal_only)
409 def expand_packed_triangular(n, packed, lower=True, diagonal_only=False):
410     r"""Convert a packed triangular matrix into a two dimensional array.
411
412     Triangular matrices can be stored with better space efficiency by
(...)
430         If true, return only the diagonal of the matrix.
431     """
--> 432     if packed.ndim != 1:
433         raise ValueError("Packed triangular is not one dimensional.")
434     if not isinstance(n, int):

AttributeError: 'tuple' object has no attribute 'ndim'
``````

I checked the dimension/type of my input:
data shape: (5, 3)
data type: <class ânumpy.ndarrayâ>

The default behavior of `pm.LKJCholeskyCov` was changed somewhere around v4. It now returns a 3-tuple of `(chol, corr, stds)`, which are, in turn, the cholesky factorized covariance matrix, the correlation matrix, and the diagonal of the standard deviation matrix.

That means you can change your code as follows:

``````    L, corr, std = pm.LKJCholeskyCov('packed_L', n=n, eta=2., sd_dist=pm.Exponential.dist(1.))
cov = pm.Deterministic('cov', L.dot(L.T))
``````

You can also pass `L` directly to `pm.MvNormal` using the `chol` argument, instead of computing the covariance matrix. Internally, the `MvNormal` will Cholesky factorize it again, so this is actually computationally better.

If you want the old behavior of `pm.LKJCholeskyCov`, you can pass `compute_corr=False`. In this case, it will return only the packed Cholesky covariance matrix.

1 Like

Many thanks for your quick reply, I reworked the script in the meantime so it partially works. The problem you mentioned I solved, please see the code bellow:

``````n = 3 # number of variables
data = np.random.randn(5, n) # example data with 5 observations and n variables

with pm.Model() as pearson_model_n:
# Priors for the mean and standard deviation of each variable are weakly informative
mean  = pm.Normal('mean', mu=0, sigma=10, shape=n)
sigma = pm.HalfNormal('sigma', sigma=10, shape=n)

# Prior for the correlation matrix
chol, corr, sigmas = pm.LKJCholeskyCov('chol_cov', eta=4, n=n, sd_dist=sigmaDist)
# compute the covariance matrix
#cov = pm.math.dot(chol, chol.T)
cov = chol.dot(chol.T)
cov = pm.Deterministic('cov', cov)

# Calculate the Pearson correlation matrix
#corrPearson = pm.Deterministic('corrPearson', cov / np.outer(sigma, sigma))

# Define the multivariate normal distribution
y_pred_n = pm.MvNormal('y_pred', mu=mean, cov=cov, observed=data)

# Perform Bayesian inference
trace_pn = pm.sample(draws=2000, chains=4, tune=1000, random_seed=42)
``````

If I uncomment

``````corrPearson = pm.Deterministic('corrPearson', cov / np.outer(sigma, sigma))
``````

it throw me an error:

``````KeyError                                  Traceback (most recent call last)
File c:\Users\hroma\anaconda3\envs\pymc202305\Lib\site-packages\pytensor\tensor\type.py:287, in TensorType.dtype_specs(self)
286 try:
--> 287     return self.dtype_specs_map[self.dtype]
288 except KeyError:

KeyError: 'object'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
Cell In[1048], line 18
15 cov = pm.Deterministic('cov', cov)
17 # Calculate the Pearson correlation matrix
---> 18 corrPearson = pm.Deterministic('corrPearson', cov / np.outer(sigma, sigma))
20 # Define the multivariate normal distribution
21 y_pred_n = pm.MvNormal('y_pred', mu=mean, cov=cov, observed=data)

File c:\Users\hroma\anaconda3\envs\pymc202305\Lib\site-packages\pytensor\tensor\var.py:173, in _tensor_py_operators.__truediv__(self, other)
172 def __truediv__(self, other):
--> 173     return at.math.true_div(self, other)

File c:\Users\uknown\anaconda3\envs\pymc202305\Lib\site-packages\pytensor\graph\op.py:295, in Op.__call__(self, *inputs, **kwargs)
253 r"""Construct an `Apply` node using :meth:`Op.make_node` and return its outputs.
...
--> 289     raise TypeError(
290         f"Unsupported dtype for {self.__class__.__name__}: {self.dtype}"
291     )

TypeError: Unsupported dtype for TensorType: object
``````

I am affraid I wrongly work with arrays but I do not know how to change it to recieve the pearsons correlation coefficient. Would you be so kind and provided me suggestions how to do it ?

`pm.LKJCholeskyCov` returns the correlation matrix and adds it to your model. You donât have to do any further computation.

But your specific problem is mixing numpy and pytensor. You should use `pytensor.tensor` for math operations instead of `numpy`. This would work:

`x = cov / pt.outer(sigma, sigma)`

In many cases, pytensor is smart enough to replace* numpy functions with the correct pytensor functions, so you can get away with doing things like `np.sum` on symbolic pytensor variables. But this coverage isnât perfect, and youâve found a case where it doesnât work. For this reason, I recommend that people never use numpy operations in their models. Always use the equivalent function from `pytensor.tensor` or `pm.math`

*Technically nothing is âreplacedâ, but many numpy functions are just written in such a way that stuff happily works out.

1 Like

many thanks!

Once again, many thanks, I add final solution, wehre the pearson correlation coefficient is identical to the correlation matrix from cholesky

``````import pytensor.tensor as pt
n = 2 # number of variables
data = np.random.randn(5, n) # example data with 5 observations and n variables

with pm.Model() as pearson_model_n:
# Priors for the mean and standard deviation of each variable are weakly informative
mean  = pm.Normal('mean', mu=0, sigma=10, shape=n)
#sigma = pm.HalfNormal('sigma', sigma=10, shape=n)

# Prior for the correlation matrix
chol, corr, sigmas = pm.LKJCholeskyCov('chol_cov', eta=4, n=n, sd_dist=sigmaDist)
# compute the covariance matrix
#cov = pm.math.dot(chol, chol.T)
cov = chol.dot(chol.T)
cov = pm.Deterministic('cov', cov)

# Calculate the Pearson correlation matrix
corrPearson = pm.Deterministic('corrPearson', cov / pt.outer(sigmas, sigmas))

# Define the multivariate normal distribution
y_pred_n = pm.MvNormal('y_pred', mu=mean, cov=cov, observed=data)

# Perform Bayesian inference
trace_pn = pm.sample(draws=2000, chains=4, tune=1000, random_seed=42)
``````
1 Like