Help with Multivariate Normal with nested prior information


Hi There,

Over the last few weeks I’ve been trying to translate a (somewhat cryptic) paper from the 80’s and implement a version of the problem in PyMC3 and leverage of variational inference but I am somewhat stuck and would love it if someone could point me in the right direction.

The problem is to, using bayes theorem, estimate theta ~ MVN(mu,cov=V).

We have prior estimates of mu and V but also (and more importantly) want to incorporate prior information of another p length vector g’s relationship to each of the n values of mu. This relationship is explained via a p x n matrix H who’s values represent the proportion of each value mu that makes up this row’s value of g. So that each observation g equals the corresponding row sum of H * mu (element wise). In addition the uncertainty and covariance of the observations g were also specified by sigma and hence this prior information was given the term Y ~ MVN(H * theta, cov = sigma)

The paper gives a toy problem and solution. This is what I’ve got so far to try and replicate it.

mu0 = np.array([5.9433962,

H.shape = (4,6)

with Model() as Maher_model:
    prior = MvNormal('prior', mu=mu0, cov=V0, shape=6)
    likelihood = MvNormal('likelihood',,mu0), cov=sigma, shape=4, observed=g)
    samples2 = fit(random_seed=RANDOM_SEED).sample(1000)

Their answers are posterior mu = (9.88, 3.43, 4.75, 5.09, 6.41, 7.27).T and the variances of V are (1.64, 1.76, 1.88, 1.85, 1.76, 2.45).T

I realised pretty quickly that there a few big gaps in my knowledge:

  1. how would I represent these nested observations i.e. g and the way in which g relates to my posterior?
  2. how do I incorporate prior information when it is a complete representation of the posterior distribution? All the examples I’ve seen seem to just put priors on the parameters of the likelihood.

Sorry if these are stupid questions - I really appreciate any support or advice I can get.



Currently, in your model there is no free parameter, in another word, there is nothing to be estimated. You are basically sampling from a MvNormal(mu0, V0).

To represent prior information in the model, you need to represent them as distribution, and let them interact with each other, for example, you can do:

with pm.Model() as Maher_model:
    theta = pm.MvNormal('theta', mu=mu0, cov=V0, shape=6)
    H0 = pm.Uniform('H0', 0, 1, shape=(4, 6))
    g =, theta)
    likelihood = pm.MvNormal('likelihood', mu=g, cov=sigma, shape=4, observed=g)

where the prior information about g is represented using theta and H0.


Thank you Junpeng!

That really helped! I think I’m almost there.
doh forgot to include in my description that g is observed (g=np.array([19.2,20.8,10.0,13.0]) - incorporating this prior info is the whole motivation whoops.
Even though it makes sense for H to be stochastic I’ve kept H as determined for now (cos i’m trying to replicate the example):

with pm.Model() as Maher_model:
    theta = pm.MvNormal('theta', mu=mu0, cov=V0*500, shape = 6)
    #H0 = pm.Normal('H0', mu = H, sd=5, shape=(4, 6))
    Ymu =, theta)
    funcY = pm.MvNormal('funcY', mu=Ymu, cov=sigma, observed=g)
    samples3 =

It seems replicate the estimates for theta answers OK:

mu = (7.4, 4.4, 4.9, 6.2, 6.5, 7.0) with fairly low SDs

but there is something I can’t understand. I would have expected that when I scaled up the covariance V0 that then the posterior means would change to not consider mu0 as much and also they would have a large standard deviation??? For some reason it doesn’t seem to be influencing very much at all. Am I missing something?


It does make a difference with different prior, you should increase the iteration number as your model is likely not yet converge with the default:

with Maher_model:
    approx =
# array([ 13.97158126,   2.21738182,   3.77380906,   4.27692797,  4.87950232,  10.64871266])
# array([ 0.71632807,  1.1325099 ,  1.33168908,  0.46198214,  0.51430541, 2.21560691])

It would be even more obvious (and more accurate) if you sample from the model:

with Maher_model:
    trace = pm.sample()


You are absolutely right! I must admit I thought I was increasing the iterations by increasing my samples on the fit method but yes fit was just defaulting out to 10k and not converging… Running perfectly now! pm.sample() seems to get to the right answer way faster (even with the default 1k). I had wrongly assumed fit method is always faster.

Learning so much, thanks to PyMC3.

Thanks again!