How to marginalize on a Multivariate Normal distribution


#1

I am new to both probabilistic modeling and PyMC so please bear with me.

I have a model that learns a Multivariate normal distribution and now I’d like to find the probabilities over a partial test input.

My problem is very similar to the one described in this question. Specifically, I refer to the final comment quoted as below:

you need to work out the conditional probability of ma. by marginalizing the MvNormal using [.5, 1.2] and cov, and generate from this new conditional probability (also a Gaussian) instead.

How can I achieve this with PyMC?

Further, is it also possible to add this new observation and update the posterior of the Multivariate distribution?


Multivariate Normal with missing inputs
Dealing with partially observed multivariate Normal values
#2

Seems it is actually much easier than I thought(!) From Wikipedia:

To obtain the marginal distribution over a subset of multivariate normal random variables, one only needs to drop the irrelevant variables (the variables that one wants to marginalize out) from the mean vector and the covariance matrix. The proof for this follows from the definitions of multivariate normal distributions and linear algebra.[18]

Example

Let X = [ X 1, X 2, X 3] be multivariate normal random variables with mean vector μ = [ μ 1, μ 2, μ 3] and covariance matrix Σ (standard parametrization for multivariate normal distributions). Then the joint distribution of X′ = [ X 1, X 3] is multivariate normal with mean vector μ′ = [ μ 1, μ 3] and covariance matrix \boldsymbol\Sigma' = \begin{bmatrix} \boldsymbol\Sigma_{11} & \boldsymbol\Sigma_{13} \ \boldsymbol\Sigma_{31} & \boldsymbol\Sigma_{33} \end{bmatrix}

And yes, it is also possible to add this new observation to the model, something like pm.MvNormal('marg', mu[0, 2], cov[[0, 2], :][:, [0, 2]], observed = x02)


#3

Thank you for your reply. However, it is still not clear to me how I might do this with PyMC.

Also, what I am really looking for is something like P(X1|X2=p) where the distribution is over X = [X1, X2, X3]. That is, I want to find the distribution over X1 when I only know that X2 is equal to some arbitrary p.


#4

Also, I don’t think indexing with mu[0, 2] is allowed. It throws IndexError: too many indices for array


#5

You should change it to mu[[0, 2]] instead.


#6

For conditional distribution see: https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Conditional_distributions

So I would say you should first drop the irrelevant variables of X = [X1, X2, X3] to get the marginal distribution X’ = [X1, X2], and then compute the conditional P(X1|X2=p).


#7

Thanks, I was in fact reading the same article.

So say of I had a distribution over X = [X0, X1, X2, X3, X4] and my test input was [--, --, 0.5, 0.6, --], to find P(X0|X2=0.5, X3=0.6), I would do the following:

  1. Marginalize over X’ = [X0, X2, X3] to get a distribution with parameters μ’ and Σ’.
  2. The parameters for the required conditional will then be found using the method as described in the linked Wikipedia article.
  3. Now I plug these parameters into pm.MvNormal and sample.

Have I understood this correctly?


#8

On a related note, finding the covariance matrix requires inverting Σ_22. Can that be achieved directly with PyMC?

Some example code I’ve worked out (inverse() is not implemented):

new_data = np.array([0.5, 0.0, 0.6, 0.0, 0.0])
mask = np.where(new_data != 0)[0]
infer = 1

with model:
    _mu = mu[[infer]] + sigma[[infer], :][:, mask].dot(inverse(sigma[mask, :][:, mask]).dot(new_data[mask] - mu[mask]))
    _sigma = sigma[[infer], :][:, [infer]] - sigma[[infer], :][:, mask].dot(inverse(sigma[mask, :][:, mask]).dot(sigma[mask, :][:, [infer]]))

#9

So I noticed something strange while trying to write my model (which isn’t correct as yet and I would really appreciate your inputs), I’ve posted my notebook in this gist.

What’s happening is when I run the line with model: trace2 = pm.sample(50, njobs=4) I get a shape mismatch error

~/.local/lib/python3.5/site-packages/pymc3/sampling.py in init_nuts(init, chains, n_init, model, random_seed, progressbar, **kwargs)
   1426         var = np.ones_like(mean)
   1427         potential = quadpotential.QuadPotentialDiagAdapt(
-> 1428             model.ndim, mean, var, 10)
   1429     elif init == 'advi+adapt_diag_grad':
   1430         approx = pm.fit(

~/.local/lib/python3.5/site-packages/pymc3/step_methods/hmc/quadpotential.py in __init__(self, n, initial_mean, initial_diag, initial_weight, adaptation_window, dtype)
    119         if initial_diag is not None and len(initial_diag) != n:
    120             raise ValueError('Wrong shape for initial_diag: expected %s got %s'
--> 121                              % (n, len(initial_diag)))
    122         if len(initial_mean) != n:
    123             raise ValueError('Wrong shape for initial_mean: expected %s got %s'

ValueError: Wrong shape for initial_diag: expected 77 got 76

(The one extra dimension corresponds to the conditional)

To check what was being passed to model.dict_to_array(vals), I printed start.keys() before line 1425 in sampling.py

Oddly enough the error went away and I was able to run the inference without any trouble, as you will see in the notebook.

cc @junpenglao @lucianopaz (as you’re both devs)


#10

I’m away from my pc so I just looked through your gist but did not run it. You’re doing some weird indexing on your arrays. Mask is a 2d array of integers, but sometimes you use it as Boolean mask and other times you are doing something like sigma[mask, :] which will get integer indexing from the first 2x2 square. The resulting array would be just a vector, if I’m not mistaken, because the colon is not interpreted as a new axis identifier.
Maybe you could take a quick glance at the numpy basic indexing to make sure that what you are writing is doing what you want it to be doing.


#11

To find the conditional and the marginal you need the full information of the covariance matrix, which I am assuming that you are trying to inference from the data. So here is what I would do it:

  1. Build model to find \mu and \Sigma
  2. Sample from the posterior
  3. Using the posterior sample to build conditional posterior, and generate posterior predictive sample from the conditional posterior. You can sample from the join conditional P(X0, X1, X4|X2=0.5, X3=0.6) and keep only the X0.

I put up a small notebook as demonstration: https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/PyMC3QnA/discourse_2265.ipynb


#12

There is no boolean indexing here, mask is actually not a mask (my bad, I know), it’s just a list of indices we want to get the marginal on. Although, I do agree that it is a huge mess that’s difficult to understand.

Thanks for you help :slight_smile:


#13

This is great, thank you! I’ve had trouble wrapping my head around PyMC but I think I’m starting to get the hang of it.

I noticed that the second time, when sampling from the conditional, you haven’t used pm.sample, why is that?


#14

pm.sample is for sampling from the posterior, here it is posterior predictive sample, which based on the sampled parameters output from pm.sample. Using scipy to sample is easier (even internally, sample_posterior_predictive is also mostly relying on scipy).