Drawing from posterior of a Multivariate Distribution


#1

I have a multivariate distribution parametrized by a vector mu and a Cholesky decomposed covariance matrix sigma = L.L^T. Pretty much the same as in the docs.

Now I want to update the prior whenever I have new data available, but the examples do not cover how to do this for cases where the parameters are vector valued.

I looked at this question which discusses the problem but no clear solution is provided.

What is the right way to do this with PyMC?

Is it possible to apply from_posterior to each axis of the trace samples and then join the Interpolated distributions? How would this join/concatenation work?


#2

You mean the updating prior using interpolation? Yeah that wouldnt work in your case with a vector prior.
But maybe you can parameterized on the things like eta of LKJCorr and update that instead?


#3

Oh maybe try something like:


#4

Thank you, I think I mostly get it.

I can get the means of each set of trace samples for Xi in X = [X1, …Xn] and then use a student-t distribution to approximate the actual distribution over X.

Assuming that my understand is correct, there are two questions:

  1. How would I provide the n mean values to a single pm.StudentT?
  2. How would the covariance (sd?) be modeled?

#5

Maybe it would be easier if you can share your code and indicate which are the parameters you would like to do posterior update.


#6

A notebook can be found here.

I doubt that the way I’ve done this in the notebook is correct because I wasn’t sure what the right way to approximate LKJCholeskyCov was.


#7

Hmmm i see. In theory you can always approximate the latent freeRVs (not necessary the RV you wrote down, as pymc3 sample in the nonbounded real space) with a distribution, but it would involves changing the model quite a bit…


#8

Thanks for your reply. Could you please elaborate a little more? I understand your comment about approximating latent free RVs but do you have any ideas on how I could go about modeling that?


#9

@junpenglao I re-parametrized my model, like so:

with pm.Model() as model1:
    mu = pm.InverseGamma('mu', 3., 1., shape=shape)
    eta = pm.Gamma('eta', 3., 1.)
    sd_dist = pm.HalfCauchy('sd_dist', 2.5)
    
    packed_L = pm.LKJCholeskyCov('packed_L', n=shape, eta=eta, sd_dist=sd_dist)
    L = pm.expand_packed_triangular(shape, packed_L)
    sigma = pm.Deterministic('sigma', L.dot(L.T))

    pm.MvNormal('observation', mu, sigma, observed=observed)

But this gives the following error:

~/.local/lib/python3.5/site-packages/pymc3/distributions/multivariate.py in logp(self, x)
    981         sd_vals = tt.sqrt(variance)
    982 
--> 983         logp_sd = self.sd_dist.logp(sd_vals).sum()
    984         corr_diag = x[diag_idxs] / sd_vals
    985 

AttributeError: 'TransformedRV' object has no attribute 'logp'

It seems that sd_dist must always be a pm.Distribution, wouldn’t it be more useful if I could model it like all the other parameters?