MutableData Container - Dimensions for LKJCholeskyCov Distribution

Hi there!

I am new to PYMC and started implementing a Multivariate Normal model. I tried to use the pymc DataContainer if possible so out-of-sample prediction is easier for a range of models.

However, I am unable to to do this for data with a MvNormal distribution. The code below works if I just comment out the pm.Data container and set the feature Dataframe as observed, and I can recover the true parameter easily:

import numpy as np
import pandas as pd
import arviz as az
import pymc as pm
print(f"Running on PyMC v{pm.__version__}") #Running on PyMC v5.11.0

RANDOM_SEED = 8927
rng = np.random.default_rng(RANDOM_SEED)

N = 10000
mu_actual = np.array([1.0, -2.0])
sigmas_actual = np.array([0.7, 1.5])
Rho_actual = np.matrix([[1.0, -0.4], [-0.4, 1.0]])
Sigma_actual = np.diag(sigmas_actual) * Rho_actual * np.diag(sigmas_actual)

x = rng.multivariate_normal(mu_actual, Sigma_actual, size=N)
df_features = pd.DataFrame(x)

# Assign Model
coords_mutable = {"T": df_features.index, "N": df_features.columns}
coords = {"features_i": df_features.columns, "features_j": df_features.columns} #, "obs_id": np.arange(T)}
N_tmp = df_features.shape[1] # For some reason 'N' not working for Cholesky, so will fix feature dimension it a priori

with pm.Model( coords_mutable = coords_mutable, coords = coords ) as MvNormal:
    # Assign data container
    feature_array = pm.Data("feature", df_features, mutable=True, dims = ("T", "N") )
    # Assign distributions
    chol, corr, stds = pm.LKJCholeskyCov(
        "chol", n = N_tmp, eta = 2.0, sd_dist=pm.Exponential.dist(1.0)
    )
    cov = pm.Deterministic("cov", chol.dot(chol.T), dims=("features_i", "features_j"))
    mu = pm.Normal("mu", mu=0.0, sigma=10.0, dims="N")
    obs = pm.MvNormal("obs", mu = mu, chol = chol, observed = feature_array, dims = ("T", "N"))
    
    # Commenting out the line 1. feature_array = ... and 2. obs = .... and replacing it with below works out of the box 
    #obs = pm.MvNormal("obs", mu = mu, chol = chol, observed = df_features, dims = ("T", "N"))
MvNormal.to_graphviz()

with MvNormal:
    trace = pm.sample(
        random_seed=rng,
        idata_kwargs={"dims": {"chol_stds": ["axis"], "chol_corr": ["axis", "axis_bis"]}},
    )
# Rewrite failure due to: local_blockwise_alloc 
# node: Blockwise{Tri{dtype='float64'}, (),(),()->(o00,o01)}(Alloc.0, Alloc.0, [0])

  1. Is there a way to make this work with the MutableData Container so I can switch out data easily?
  2. Also, how do I assign the n argument in LKJCholeskyCov via the coords dict? I seem to be unable to do that unfortunately.

I scoured google for similar cases, but was not successful. Let me know if there is something I can do to solve my issues :slight_smile: .

What version of PyMC are you on? If not the latest can you try to update and check again?

We fixed some similar looking issues recently. Although that by itself shouldn’t cause your code to fail, just looks ugly

Hey Ricardo,

Updating to 5.15.1 did the trick and it worked (had 5.11 and 5.15.0 before) - thank you!

Is there a way to define the argument ‘n’ in LKJCholeskyCov via coords? I seem to be unable to just set

chol, corr, stds = pm.LKJCholeskyCov(
        "chol", n = "N", ...
    )

in the example above.

n = len(coords[”N”]) ?

Thank you!