Using a mixture of multivariate normal distributions as a prior


#1

Following on from Using a Multivariate Normal as a prior is it possible to use a mixture of multivariate normals as a prior? Can the pm.Mixture even be used in defining a prior distribution, as it often seems to require observed values?


#2

The current mixture Class is not optimal for multivariate distributions (although it has gotten so much better lately thanks to @lucianopaz). If you are using is a prior, make sure you specify the shape carefully.


#3

It is possible in the current master branch on GitHub. You can do something like the following

with pm.Model():
    component0 = pm.MvNormal('c0', mu=[0, 0], cov=[[1, 1], [1, 1]], shape=2)
    component1 = pm.MvNormal('c1', mu=[0, 0], cov=[[1, -1], [-1, 1]], shape=2)
    w = pm.Beta('w', 1, 1)
    mix = pm.Mixture('mix', w=[w, 1 - w], comp_dist=[component0.distribution, component1.distribution], shape=2)
    obs = pm.Normal('obs', mu=mix[0] * x + mix[1], sigma= 1, observed=y)


#4

Thanks very much. That works very well. Any idea when the next release will be that includes this?


#5

We had a brief discussion last week on whether we could bundle together a nightly-release. I’m not sure when that would be, nor when 3.7 will be released. @colcarroll, @twiecki, @ferrine, do you have any updates on nightly and 3.7 releases?


#6

Thanks.

Sorry to add another question, but do you know if it’s also possible to use the mixture within a pm.Bounded object, to set lower and upper bounds on the distribution?


#7

I think we should push out 3.7 asap.


#8

I haven’t tried, but in principle it should be possible. Let us know if you run in to any trouble.


#9

I’ve tried adding bounds and it seems to lead to something a bit odd. Below is some example code:

%matplotlib inline

from matplotlib import pyplot as pl
import numpy as np
import pymc3 as pm

# set some example data (which is uniformative)
x = np.linspace(0, 9, 10)
sigma = 100.
data = sigma*np.random.randn(10)  # broad (uninformative) Gaussian data

mus = [[0., 0.], [5., 5.]]  # means of Gaussians
covs = [np.array([[1., 0.], [0., 1.]]), np.array([[1., 0.9], [0.9, 1.]])]  # covariances

weights = [0.7, 0.3]  # weights of each mode

lowerbounds = [0., 0.]
upperbounds = [5., 5.]

shape = len(mus[0])  # number of dimensions

# set test values in the centre of the bounds
testvals = np.add(lowerbounds, np.subtract(upperbounds, lowerbounds)/2.)

with pm.Model():
    # define bounded MvNormal
    BoundedMvN = pm.Bound(pm.MvNormal, lower=lowerbounds, upper=upperbounds)

    comp_dists = []
    for i, mu, cov in zip(range(shape), mus, covs):
        comp_dists.append(BoundedMvN('c{}'.format(i), mu=mu, cov=cov, shape=shape).distribution)
    mix = pm.Mixture('mix', w=weights, comp_dists=comp_dists, shape=shape, testval=testvals)

    obs = pm.Normal('obs', mu=mix[0] * x + mix[1], sd=sigma, observed=data)

    trace = pm.sample(2000)

If I plot the output:

pl.plot(trace['mix'][:,0], trace['mix'][:,1], '.')

It gives:

which shows it only seems to have sampled from one of the modes of the mixture.

If I remove the bounds, e.g., using:

with pm.Model():
    comp_dists = []
    for i, mu, cov in zip(range(shape), mus, covs):
        comp_dists.append(pm.MvNormal('c{}'.format(i), mu=mu, cov=cov, shape=shape).distribution)
    mix = pm.Mixture('mix', w=weights, comp_dists=comp_dists, shape=shape, testval=testvals)

    obs = pm.Normal('obs', mu=mix[0] * x + mix[1], sd=sigma, observed=data)

    trace = pm.sample(2000)

It samples from both modes as expected:

Do you have any idea what might be the issue?


#10

I should add that if I plot the the individual components c0 and c1 they do seem to be sampled correctly, its just the mixture mix that only shows one component.


#11

And if you use sample_prior_predictive instead, do you still get just a single component?


#12

Using sample_prior_predictive gives the following error:

ValueError: Drawing samples from distributions with array-valued bounds is not supported.

#13

Hmm, I’ll try to look into the problem when I have some time. The sample_prior_predictive part will take some time so I’ll start with the inference part with no data being stuck in a single component.


#14

I’m actually running the snippet you supplied and it works fine for me with a conda-installed pymc3-3.6. It feels weird to say that the supplied example runs too well :smiley: :

import arviz as az

data = az.from_pymc3(trace)
az.plot_joint(data.posterior, var_names=['mix'], coords={'mix_dim_0': np.array([0, 1])});

image


#15

Playing around a bit more, it is possible NUTS was just having trouble mixing on your first run. I reran with target_accept=0.99 since there were a lot of divergences, and get a better mix of samples. Still plenty of divergences, I assume from the boundary conditions.

Here’s the new marginal plot. The scatter is hard to distinguish in the first plot, but you can see differences in the marginal distributions. I’m using a custom stylesheet here!


#16

Cool! Trying again it looks like I get decent sampling of both modes now. It must have just been that my initial attempt got a bit stuck!


#17

Just a quick note - I think part (or maybe all!) of the reason I originally only got one mode was because the likelihood wasn’t entirely uninformative and therefore led to one mode being favoured. If I make the data “noise free” then I think it more consistently returns both modes.