Correctly specifying a model with a multimodal distribution

Hello PyMC3 community.

I’m trying to build a model based on a multimodal disribution.

I have the following code, but am not sure that it’s correct:

  1. The way I’m specifying priors is using the .dist syntax, which makes me think it’s not stochastic.
  2. If I plot the prior predictive and posterior predictive samples, they are not very different from each other.
import numpy as np
import pymc3 as pm
import matplotlib.pyplot as plt

W = np.array([0.5, 0.5, 0.5])

data = np.array([1.73, 1.59, 1.64, 1.58])

with pm.Model() as model_mix:
    # Mixing variable
    w = pm.Dirichlet('w', np.ones_like(W))

    dist1 = pm.Normal.dist(mu = 1.72, sigma = 0.01)
    dist2 = pm.Normal.dist(mu = 1.78, sigma = 0.01)
    dist3 = pm.StudentT.dist(mu = 1.88, sigma = 0.01, nu = 5)

    like = pm.Mixture('like', w=w, comp_dists=[dist1, dist2, dist3], observed=data)

    trace = pm.sample(2000, tune = 8000, cores=1)

params_prior_mix = pm.sample_prior_predictive(model = model_mix, samples = 2000)
params_pred_mix = pm.sample_posterior_predictive(trace, 1000, model_mix)
predictions_vec_mix = params_pred_mix['like'].reshape(4000)

plt.hist(np.random.choice(predictions_vec_mix, 1000), label='Posterior', color='r')
plt.hist(params_prior_mix['like'][:,0], label='Prior')
plt.hist(data, color='pink', label='Measured Distribution')
plt.legend()    

Do I need to model the distributions of the parameters of dist1, dist2, dist3 in order to correctly specify the priors?

1 Like

Have you checked this notebook?

Regarding your prior and posterior samples looking similar, you have very little data, so that may be what’s expected

I haven’t seen this, very useful! The article refers to pymc4 but pymc3 also has the NormalMixture method. I tried is using pymc3 and it’s very slow with the sampling. Is NormalMixture optimized in pymc4 to run faster, by chance?

1 Like

The pymc3 version of that notebooks is here. Sampling categorical models like these tend to be difficult due to the categorical/discrete nature of some of the parameters. However, as you can see by comparing those 2 notebooks, there have been changes in v4. Whether they help speed-wise, I’m not sure. But it may be worth trying both.

I see, the pymc4 version is considerably simpler. The generated data clearly has 3 modes, but it appears that the sigma prior has the same parametes (halfnormal distr, sigma=1).

So I think the idea here to just use a very weak prior? In my case, since my data is scarce, I would want to specify differing sigmas for each of the prior distributions in the mixture. How would I do that?

You may be able to just pass in a vector of sigma values when defining \sigma. Not 100% of that, but you could sample from the prior to check.

1 Like

So I tried the following:

data = np.array([1.73, 1.59, 1.64, 1.58])

with pm.Model(coords={"cluster": range(k)}) as model_mix:
    μ = pm.Normal(
        "μ",
        mu=0,
        sigma=.01,
        transform=pm.distributions.transforms.ordered,
        initval=[1.72, 1.78, 1.88],
        dims="cluster",
    )
    σ = pm.HalfNormal("σ", sigma=np.array([.025, .025, 0.01]), dims="cluster")
    weights = pm.Dirichlet("w", np.ones(k), dims="cluster")
    pred = pm.NormalMixture("x", w=weights, mu=μ, sigma=σ, observed=data)

    trace_mix = pm.sample()

with model_mix:
    trace_mix.extend(pm.sample_prior_predictive())
    trace_mix.extend(pm.sample_posterior_predictive(trace_mix))

plt.hist(trace_mix['prior_predictive']['x'].to_numpy().reshape(-1))

However, sampling from the prior predictive as above just gives a normal-like distribution with a mean of 0. So something isn’t right.

Curious what the difference is between the formulation above using NormalMixture versus the example from here.

I would probably have to specify 2 priors for each of the distributions under this formulation (so 6 priors in total):

# 2-Mixture Poisson using iterable of distributions.
with pm.Model() as model:
    lam1 = pm.Exponential('lam1', lam=1)
    lam2 = pm.Exponential('lam2', lam=1)

    pois1 = pm.Poisson.dist(mu=lam1)
    pois2 = pm.Poisson.dist(mu=lam2)

    w = pm.Dirichlet('w', a=np.array([1, 1]))

    like = pm.Mixture('like', w=w, comp_dists = [pois1, pois2], observed=data)

Given that the prior on the means of all three components are zero, isn’t this what one would expect?

1 Like

So using the formulation from the first link with an array of 3 different sigmas worked. How do I think about the 2 different ways of specifying the mixture models - the pm.NormalMixture versus pm.Mixture. Are they alternate ways of specifying the same thing (when I specify a normal distrs with pm.Mixture) or is there a fundamental difference between them?

My understanding is that NormalMixture and MixtureSameFamily are convenience classes to simplify common use-cases, though there could be slight tweaks available in these special cases.

pm.NormalMixture simply uses pm.Mixture under the hood

2 Likes