Learning covariances using LKJCholeskyCov at the group level in a hierarchical multivariate Normal model

I’m interested in using pm.LKJCholeskyCov() to learn group-level covariance matrices for a hierarchical MVN generative process, but can’t figure out how to learn the covariance matrix at the group (rather than global) level.

Example data generating process: The observations x and y belong to two groups: A and B. We know which group each observation corresponds to. Additionally, within each group the data are generated from a 2D multivariate Normal with different means & covariances. I want to estimate each group’s mean & covariance with a hierarchical model.

For estimating the means, this process is straightforward when constructing the pm.MvNormal() specification: we simply specify a (2, 1) x 2 prior (2D vector of means x 2 groups). The syntax for this is fairly straightfoward:

mu_group = pm.Normal("mu_group", mu=0, sigma=1, 
                     dims=("groups", "features"))

For estimating the covariances, I would like to follow the same process and use pm.LKJCholeskyCov() to construct a (2, 2) x 2 prior (2x2 covariance matrix x 2 groups).

However, there does not seem to be a clear way to initialize a covariance-matrix prior along a given dimension. Instead, I am constrained to learning the “global” covariance matrix without any specification of the grouping structure. Is there a simple workaround to learn covariance matrices at the group level rather than globally in PyMC?

I’ve included full code for a reprex below (running PyMC 15.1). Sampling from model reveals that we “correctly” learn the group means as (0,0) and (1, -1) respectively, but learn the covariances globally instead of at the group level.

import pymc as pm
import numpy as np
import pandas as pd

sample_size = 1000

mean1 = [0, 0] 
mean2 = [1, -1]
cov1 = [[1, 0], [0, 1]]  # covariance matrix 1
cov2 = [[1, -0.5], [-0.5, 1]] # covariance matrix 2

sample1 = np.random.multivariate_normal(mean1, cov1, sample_size)
sample2 = np.random.multivariate_normal(mean2, cov2, sample_size)
groups = np.repeat(["A", "B"], sample_size)

sample = np.concatenate([sample1, sample2])

df = pd.DataFrame(data = {
    "group" : groups,
    "x" : sample[:, 0], 
    "y" : sample[:, 1],
})

group_values, group_names = pd.factorize(df["group"])
features = ["x", "y"]

coords = {
    "groups": group_names,
    "features": features
}

df_numeric = df[features]

with pm.Model(coords=coords) as model:

    ###########################
    # Specifying Data objects #
    ###########################

    group_idx = pm.Data("group_idx", value=group_values)
    observations = pm.Data("observations", value=df_numeric) 

    ##########
    # Priors #
    ##########

    mu_group = pm.Normal("mu_group", mu=0, sigma=1, dims=("groups", "features"))

    #######################
    # Deterministic terms #
    #######################

    mu = mu_group[group_idx]

    ##################################
    # Setting up the Cholesky matrix #
    ##################################

    # LKJ Cholesky decomposition.
    sd_dist = pm.Exponential.dist(1, shape=len(features))

    chol, _, _ = pm.LKJCholeskyCov(
        "chol",
        n=len(features),
        eta=2,
        sd_dist=sd_dist,
        compute_corr=True,
        store_in_trace=False,
    )

    cov = pm.Deterministic("cov", chol.dot(chol.T))

    #######################
    # Specifying our data #
    #######################

    obs = pm.MvNormal("obs", mu=mu, chol=chol, observed=observations, shape=(df_numeric.shape))

The way your model is written, groups A and B are generated by entirely different distributions. The most directly solution is to respect this fact in your PyMC model, and use two observation distributions:

    # LKJ Cholesky decomposition.
    observed_vars = []
    for i, group in enumerate(group_names):
        sd_dist = pm.Exponential.dist(1, shape=len(features))

        chol, _, _ = pm.LKJCholeskyCov(
            f"{group}_chol",
            n=len(features),
            eta=2,
            sd_dist=sd_dist,
            compute_corr=True,
            store_in_trace=False,
        )

        cov = pm.Deterministic(f"{group}_cov", chol.dot(chol.T))

        #######################
        # Specifying our data #
        #######################
        n = pt.eq(group_idx, i).sum()
        obs = pm.MvNormal(f"{group}_obs", mu=mu_group[i], chol=chol, observed=df_numeric.values[group_values == i], shape=(n, 2))
        observed_vars.append(obs)
        
    observed_data = pm.Deterministic('obs', pt.concatenate(observed_vars, axis=0))
    idata = pm.sample()

Alternatively, you could use block_diag to fuse the two covariance matrices together, and ravel your mu_group, so that you have \mu = \begin{bmatrix} \mu_{A,x} & \mu_{A,y} & \mu_{B,} & \mu_{B,y} \end{bmatrix}^T and \Sigma = \begin{bmatrix} \Sigma_A & 0 \\ 0 & \Sigma_B \end{bmatrix}. I thought this was slightly less elegant because you will have to pad your data with zeros (it needs 4 columns, one for each case, and you would have to decide what to do with the unobserved components). I believe the solution I presented above more closely reflects your data generating process.

One note, you can’t use pm.Data for the observed data in this case, unless you split it by groups ahead of time (e.g. make observed_A and observed_B). This is because PyMC only accepts leaf nodes as observed data (much to my chagrin).