Modeling question regarding incorporating prior into my likelihood

Hi all, I have a modeling question. My data likelihood follows a matrix normal distribution where the mean, row covariance, and column covariance have DxD dimension respectively. And I’m putting a unimodal mixture normal prior (it’s like a continuous smooth version of spike and slab prior) on each edge of my network (G) such that it captures spare G on the posterior. Given I’m trying to capture sparse G on my posterior, I am trying to use nutpie as I heard it’s fast, but the following code does not work. In actual setting the model input looks different, below is an example data.

  1. Am I incorporating the prior into likelihood properly such that my model follows P(G | R^, S^) = P(R^ | G, S^) x P(G | S^)

  2. As I’m giving some sparsity on the prior, is nutpie enough or shall I use something like normalizing flow?

import pymc as pm
import numpy as np
import aesara.tensor as at
import pandas as pd

# Load Data
G = pd.read_csv('G.csv', header=None).values
R_hat = pd.read_csv('R.csv', header=None).values

# Convert to NumPy arrays
D = G.shape[0]  # Dimensionality of G and R_hat
pi_0 = np.sum(np.abs(G) < 1e-5) / (D * D)  # Compute global sparsity
U = np.eye(D)  # Example row covariance matrix
V = np.eye(D)  # Example column covariance matrix
epsilon = 1e-5  # Numerical stability
sigma_0 = 0.1  # Variance for absent edges
sigma_k = np.linspace(0.01, 10, 100)  # Variance grid for mixture normal prior

with pm.Model() as pymc_model:
    # PRIOR: Define P(G | S^)
    z = pm.Bernoulli('z', p=pi_0, shape=(D, D))  # Sparsity prior

    # Absent edges: Normal(0, sigma_0)
    absent_edge = pm.Normal('absent_edge', mu=0, sigma=np.sqrt(sigma_0), shape=(D, D))

    # Present edges: Mixture of Normals
    K = len(sigma_k)
    pi_k = np.ones(K) / K  # Uniform weights over sigma_k
    component_distributions = [
        pm.Normal.dist(mu=0, sigma=np.sqrt(sigma)) for sigma in sigma_k
    ]
    present_edge = pm.Mixture(
        'present_edge',
        w=pi_k,
        comp_dists=component_distributions,
        shape=(D, D),
    )

    # Combine absent and present edges based on sparsity indicator
    G_edge = pm.Deterministic('G_edge', z * absent_edge + (1 - z) * present_edge)

    # LIKELIHOOD: Define P(R_hat | G, S^)
    L_U = np.linalg.cholesky(U + epsilon * np.eye(D))
    L_V = np.linalg.cholesky(V + epsilon * np.eye(D))
    pm.MatrixNormal(
        'R_hat_obs',
        mu=R_hat,  # Observed R_hat used as the mean
        rowcov=L_U @ L_U.T,
        colcov=L_V @ L_V.T,
        observed=R_hat,
    )

# Sampling
with pymc_model:
    trace = pm.sample(
        draws=1000,
        chains=4,
        target_accept=0.9,
        return_inferencedata=True,
    )