Multivariate normal distribution with dependent covariance and advi

Hi

I am trying to implement an inference model that is a combination of a multivariate normal model and a hierarchical regression model. Let me describe the problem a bit first:

I have two features v1 and v2. Let’s say I know that they are both normally distributed and I already know their means, mu(v1) and mu(v2), and variances, var(v1) and var(v2).

I also know that these two variables are not independent and show a considerable covariance (Let’s say they are correlated). Now my aim is to fit a bayesian model to predict the covariance between the two variables.

However, there’s a catch. I have reason to believe that the covariance is actually dependent on another continuous variable c1. I would like to model this dependence as a spline function, i.e, the covariance varies based on the value of c1.

Note: it is also known that c1 impacts mu(v1), mu(v2), var(v1), and var(v2) and the model takes this into account.

So here’s a piece of (pseudo)code that I wrote trying to fit a variational inference model to approximate the covariance (the data size is really big so I would prefer ADVI):

# first, standardize all variables to center the distributions around zero
v1_standardized = (v1 - v1.mean()) / v1.std()
v2_standardized = (v2 - v2.mean()) / v2.std()
vs_standardized = np.concatenate([v1_standardized[:, np.newaxis], v2_standardized[:, np.newaxis]], axis=1)
c1_standardized = (c1 - c1.mean()) / c1.std()

# Splines to model nonlinear effects of c1
# number of spline nuts (could be tuned)
num_knots = 3
knot_list = np.quantile(c1_standardized, np.linspace(0, 1, num_knots))
# create b spline basis for regression using patsy
B_spline_c1 = patsy.dmatrix(
    "bs(c1_standardized, knots=knots, degree=3, include_intercept=True) - 1",
    {"c1_standardized": c1_standardized, "knots": knot_list[1:-1]},
)

coords = {
    "splines": np.arange(B_spline_age.shape[1]),
    "obs_id": np.arange(len(v1_standardized)),
}

advi_model_cov = pm.Model(coords=coords)

with advi_model_cov:
    # minibatch variables
    c1_standardized_t = pm.Minibatch(c1_standardized, 100,)
    B_spline_c1_t = pm.Minibatch(B_spline_c1, 100)
    vs_standardized_t = pm.Minibatch(vs_standardized, 100)

    # Priors (for covariance)

    # c1 splines
    w_c1_cov = pm.Normal("w_c1_cov", mu=0, sigma=10, size=B_spline_c1.shape[1], dims="splines")

    # Estimated covariance
    cov_est = pm.Deterministic("cov_est", pm.math.dot(B_spline_c1_t, w_c1_cov.T))

    # Priors (constant priors, already given)

    # c1 splines (the spline weights are given)
    w_c1_v1 = pm.ConstantData("w_c1_v1", np.array(given_w_c1_v1), dims="splines")
    w_c1_v2 = pm.ConstantData("w_c1_v2", np.array(given_w_c1_v2), dims="splines")

    # Estimated means
    est_v1 = pm.Deterministic("est_v1", pm.math.dot(B_spline_c1_t, w_c1_v1.T))
    est_v2 = pm.Deterministic("est_v2", pm.math.dot(B_spline_c1_t, w_c1_v2.T))

    # Model variance

    # nonlinear c1 dependent variance (spline weights are given)
    w_var_c1_v1 = pm.ConstantData("w_var_c1_v1", np.array(given_w_var_c1_v1), dims="splines")
    w_var_c1_v2 = pm.ConstantData("w_var_c1_v2", np.array(given_w_var_c1_v2), dims="splines")

    # Variance estimate
    var_v1 = pm.Deterministic("var_v1", pm.math.dot(B_spline_c1_t, w_var_c1_v1.T))
    var_v2 = pm.Deterministic("var_v2", pm.math.dot(B_spline_c1_t, w_var_c1_v2.T))

    # Construct the mean vector and covariance matrix for MvNormal to fit a bivariate normal
    bivariate_mu = pm.Deterministic("bivariate_mu", at.as_tensor_variable([est_v1, est_v2]).T)
    covariance_matrix = pm.Deterministic(
        "covariance_matrix",
        at.stacklists([[var_v1, cov_est], [cov_est, var_v2]]).T
    )

    # Likelihood estimation from a bivariate normal with known mean and variance, but unknown covariance
    likelihood = pm.MvNormal(
        "likelihood",
        mu=bivariate_mu,
        cov=covariance_matrix,
        observed=vs_standardized_t,
        total_size=len(v1_standardized),
    )

    # run ADVI with minibatch
    approx_cov = pm.fit(100000, callbacks=[pm.callbacks.CheckParametersConvergence(tolerance=1e-4)])

    # sample from trace
    advi_model_idata_cov = pm.sample_prior_predictive()
    advi_model_idata_cov.extend(approx_cov.sample(2000))
    pm.sample_posterior_predictive(advi_model_idata_cov, extend_inferencedata=True)

Apologies for this very long script (believe me this is actually a simplified version of the script I’m working with, but does summarize the gist of my question).

So when I run the model above, I get the following error:

ValueError: cov must be two dimensional.

I believe this means that MvNormal distribution cannot accept a dependent covariance matrix that is different for every observed datapoint. However, I don’t know what I could do to make it work.

Please let me know if you have any ideas on how best to solve this problem.

Hi so to post an update,

I noticed there was an issue in my previous model in that covariance was not bounded which is mathematically incorrect, so I changed a few things (I tried feeding the unbounded linear value for correlation to a sigmoid to bound it), and also tried implementing the MvNormal using a Cholesky factorization:

# first, standardize all variables to center the distributions around zero
v1_standardized = (v1 - v1.mean()) / v1.std()
v2_standardized = (v2 - v2.mean()) / v2.std()
vs_standardized = np.concatenate([v1_standardized[:, np.newaxis], v2_standardized[:, np.newaxis]], axis=1)
c1_standardized = (c1 - c1.mean()) / c1.std()

# Splines to model nonlinear effects of c1
# number of spline nuts (could be tuned)
num_knots = 3
knot_list = np.quantile(c1_standardized, np.linspace(0, 1, num_knots))
# create b spline basis for regression using patsy
B_spline_c1 = patsy.dmatrix(
    "bs(c1_standardized, knots=knots, degree=3, include_intercept=True) - 1",
    {"c1_standardized": c1_standardized, "knots": knot_list[1:-1]},
)

coords = {
    "splines": np.arange(B_spline_age.shape[1]),
    "obs_id": np.arange(len(v1_standardized)),
}

advi_model_cov = pm.Model(coords=coords)

with advi_model_cov:
    # minibatch variables
    c1_standardized_t = pm.Minibatch(c1_standardized, 100,)
    B_spline_c1_t = pm.Minibatch(B_spline_c1, 100)
    vs_standardized_t = pm.Minibatch(vs_standardized, 100)

    # Priors (for covariance)

    # c1 splines
    w_c1_rho = pm.Normal("w_c1_cov", mu=0, sigma=10, size=B_spline_c1.shape[1], dims="splines")

    # Estimated covariance
    rho_est = pm.Deterministic("cov_est", 2 * pm.math.sigmoid(pm.math.dot(B_spline_c1_t, w_c1_cov.T)) - 1 )

    # Priors (constant priors, already given)

    # c1 splines (the spline weights are given)
    w_c1_v1 = pm.ConstantData("w_c1_v1", np.array(given_w_c1_v1), dims="splines")
    w_c1_v2 = pm.ConstantData("w_c1_v2", np.array(given_w_c1_v2), dims="splines")

    # Estimated means
    est_v1 = pm.Deterministic("est_v1", pm.math.dot(B_spline_c1_t, w_c1_v1.T))
    est_v2 = pm.Deterministic("est_v2", pm.math.dot(B_spline_c1_t, w_c1_v2.T))

    # Model variance

    # nonlinear c1 dependent variance (spline weights are given)
    w_var_c1_v1 = pm.ConstantData("w_var_c1_v1", np.array(given_w_var_c1_v1), dims="splines")
    w_var_c1_v2 = pm.ConstantData("w_var_c1_v2", np.array(given_w_var_c1_v2), dims="splines")

    # Variance estimate
    var_v1 = pm.Deterministic("var_v1", pm.math.dot(B_spline_c1_t, w_var_c1_v1.T))
    var_v2 = pm.Deterministic("var_v2", pm.math.dot(B_spline_c1_t, w_var_c1_v2.T))

    # Construct the mean vector and covariance matrix for MvNormal to fit a bivariate normal
    bivariate_mu = pm.Deterministic("bivariate_mu", at.as_tensor_variable([est_v1, est_v2]).T)
    cholesky_decomposition = pm.Deterministic("cholesky_decomposition", at.as_tensor_variable([est_v1, at.math.mul(est_v2, rho_est), at.math.mul(est_v2, at.math.sqrt(1 - rho_est**2)),]).T)

    # Likelihood estimation from a bivariate normal with known mean and variance, but unknown covariance
    likelihood = pm.MvNormal(
        "likelihood",
        mu=bivariate_mu,
        chol=cholesky_decomposition,
        observed=vs_standardized_t,
        total_size=len(v1_standardized),
    )

    # run ADVI with minibatch
    approx_cov = pm.fit(100000, callbacks=[pm.callbacks.CheckParametersConvergence(tolerance=1e-4)])

    # sample from trace
    advi_model_idata_cov = pm.sample_prior_predictive()
    advi_model_idata_cov.extend(approx_cov.sample(2000))
    pm.sample_posterior_predictive(advi_model_idata_cov, extend_inferencedata=True)

I need to note that I used the bivariate cholesky decomposition as described in this lecture note.

Now when I run this I get an error when pm.fit is executed to perform ADVI. I get the following error:

AssertionError: Could not broadcast dimensions
Apply node that caused the error: Assert{msg=Could not broadcast dimensions}(Abs.0, AND.0)
Toposort index: 261
Inputs types: [ScalarType(int64), ScalarType(bool)]
Inputs shapes: [(), ()]
Inputs strides: [(), ()]
Inputs values: [100, False]
Outputs clients: [[TensorFromScalar(Assert{msg=Could not broadcast dimensions}.0)]]

Does anyone know what I’m doing wrong?

P.S. check this issue for a sample code to reproduce the error.