I cannot recover the correct parameters from an end-to-end simulation. I am assuming I have data distributed 5D-Gaussian like this:
mus = np.array([4,2,3,1,6]) #mean values
K = mus.shape[0]
N = 100 # Total amount of data
sigmas = np.eye(5)+0.3*np.random.random((5,5)) #covariance matrix, not positively defined but OK
data = np.random.multivariate_normal(mus, sigmas, size=N) # data generation
I propose the same model for inference
with pm.Model() as modelCorr:
# Prior over component means
mus = pm.MvNormal('mu_', mu=pm.floatX(np.zeros(K)), tau=pm.floatX(0.1 * np.eye(K)), shape=(K,))
# Cholesky decomposed LKJ prior over component covariance matrices
L = pm.LKJCholeskyCov('packed_L_', n=K, eta=2., sd_dist=pm.HalfCauchy.dist(1))[0]
# Convert L to sigma for convenience
sigma = pm.Deterministic('sigma_' ,L.dot(L.T))
# Specify the likelihood
obs = pm.MvNormal('like', mu=mus, chol=L, observed=data)
The model compiles well and I can sample using
with modelCorr:
traceCorr = pm.sample(chains=8)
Then I estimate the PPCs using,
with modelCorr:
est = pm.sample_posterior_predictive(traceCorr, var_names=['mu_', 'packed_L_'])
and finally I compute the expected distribution of the data using,
x_mu = np.asarray(est.posterior_predictive.mu_)
x_L = np.asarray(est.posterior_predictive.packed_L_)
# reshape to stack all the chains
xx_mu = x_mu.reshape(-1, x_mu.shape[-1])
xx_L = x_L.reshape(-1, x_L.shape[-1])
# data estimation
N = xx_mu.shape[0]
estimation = np.zeros(xx_mu.shape)
for n in range(N):
estimation[n, :] = pm.MvNormal.dist(mu=xx_mu[n,:], chol=pm.expand_packed_triangular(5, xx_L[n,:], lower=True)).eval()
The mean values of the posterior means are already very bad (array([-0.06958857, -0.01141641, -0.00285485, -0.00243548, 0.01582915])). And of course, the estimated values have a really wide spread.