Hi there!
I am new to PYMC and started implementing a Multivariate Normal model. I tried to use the pymc DataContainer if possible so out-of-sample prediction is easier for a range of models.
However, I am unable to to do this for data with a MvNormal distribution. The code below works if I just comment out the pm.Data container and set the feature Dataframe as observed, and I can recover the true parameter easily:
import numpy as np
import pandas as pd
import arviz as az
import pymc as pm
print(f"Running on PyMC v{pm.__version__}") #Running on PyMC v5.11.0
RANDOM_SEED = 8927
rng = np.random.default_rng(RANDOM_SEED)
N = 10000
mu_actual = np.array([1.0, -2.0])
sigmas_actual = np.array([0.7, 1.5])
Rho_actual = np.matrix([[1.0, -0.4], [-0.4, 1.0]])
Sigma_actual = np.diag(sigmas_actual) * Rho_actual * np.diag(sigmas_actual)
x = rng.multivariate_normal(mu_actual, Sigma_actual, size=N)
df_features = pd.DataFrame(x)
# Assign Model
coords_mutable = {"T": df_features.index, "N": df_features.columns}
coords = {"features_i": df_features.columns, "features_j": df_features.columns} #, "obs_id": np.arange(T)}
N_tmp = df_features.shape[1] # For some reason 'N' not working for Cholesky, so will fix feature dimension it a priori
with pm.Model( coords_mutable = coords_mutable, coords = coords ) as MvNormal:
# Assign data container
feature_array = pm.Data("feature", df_features, mutable=True, dims = ("T", "N") )
# Assign distributions
chol, corr, stds = pm.LKJCholeskyCov(
"chol", n = N_tmp, eta = 2.0, sd_dist=pm.Exponential.dist(1.0)
)
cov = pm.Deterministic("cov", chol.dot(chol.T), dims=("features_i", "features_j"))
mu = pm.Normal("mu", mu=0.0, sigma=10.0, dims="N")
obs = pm.MvNormal("obs", mu = mu, chol = chol, observed = feature_array, dims = ("T", "N"))
# Commenting out the line 1. feature_array = ... and 2. obs = .... and replacing it with below works out of the box
#obs = pm.MvNormal("obs", mu = mu, chol = chol, observed = df_features, dims = ("T", "N"))
MvNormal.to_graphviz()
with MvNormal:
trace = pm.sample(
random_seed=rng,
idata_kwargs={"dims": {"chol_stds": ["axis"], "chol_corr": ["axis", "axis_bis"]}},
)
# Rewrite failure due to: local_blockwise_alloc
# node: Blockwise{Tri{dtype='float64'}, (),(),()->(o00,o01)}(Alloc.0, Alloc.0, [0])
- Is there a way to make this work with the MutableData Container so I can switch out data easily?
- Also, how do I assign the
n
argument inLKJCholeskyCov
via thecoords
dict? I seem to be unable to do that unfortunately.
I scoured google for similar cases, but was not successful. Let me know if there is something I can do to solve my issues .