Consider the model
Y: Observed variables following a multivariate Normal distribution with shape (N, K) i.e. N observations with dimension K.
mu: A vector representing the mean of the multivariate Normal, with shape (K,).
cov: A matrix representing the covariance of the multivariate Normal, with shape (K, K).
The issue I have is that for some observations, the Y[i, :]
vector is not fully observed e.g. we do not observed the final value Y[i, -1]
. We still know that
Y[i, :-1] ~ MVN(mu[:-1], cov[:-1, :-1]
so the model should technically still work mathematically. How do I deal with the fact there can be unknown values in the observed varaible when coding a pymc3
model?
Here is my code in the case where it is assumed there are no missing observed values.
K = Y.shape[1]
with pm.Model() as model:
# Hyper-parameters.
sd_Sigma = pm.HalfCauchy.dist(beta=2.5, shape=K)
# Priors.
mu = pm.Normal('mu_alpha', 0, 1e4, shape=K)
L_packed = pm.LKJCholeskyCov('packed_L', n=K, eta=2., sd_dist=sd_Sigma)
L = pm.expand_packed_triangular(K, L_packed)
# Define likelihood.
obs = pm.MvNormal('obs', mu, chol=L, observed=Y)
with model:
trace = pm.sample(draws=3000, cores=1, n_init=1000)