Correctly specifying a model with a multimodal distribution

Hello PyMC3 community.

I’m trying to build a model based on a multimodal disribution.

I have the following code, but am not sure that it’s correct:

  1. The way I’m specifying priors is using the .dist syntax, which makes me think it’s not stochastic.
  2. If I plot the prior predictive and posterior predictive samples, they are not very different from each other.
import numpy as np
import pymc3 as pm
import matplotlib.pyplot as plt

W = np.array([0.5, 0.5, 0.5])

data = np.array([1.73, 1.59, 1.64, 1.58])

with pm.Model() as model_mix:
    # Mixing variable
    w = pm.Dirichlet('w', np.ones_like(W))

    dist1 = pm.Normal.dist(mu = 1.72, sigma = 0.01)
    dist2 = pm.Normal.dist(mu = 1.78, sigma = 0.01)
    dist3 = pm.StudentT.dist(mu = 1.88, sigma = 0.01, nu = 5)

    like = pm.Mixture('like', w=w, comp_dists=[dist1, dist2, dist3], observed=data)

    trace = pm.sample(2000, tune = 8000, cores=1)

params_prior_mix = pm.sample_prior_predictive(model = model_mix, samples = 2000)
params_pred_mix = pm.sample_posterior_predictive(trace, 1000, model_mix)
predictions_vec_mix = params_pred_mix['like'].reshape(4000)

plt.hist(np.random.choice(predictions_vec_mix, 1000), label='Posterior', color='r')
plt.hist(params_prior_mix['like'][:,0], label='Prior')
plt.hist(data, color='pink', label='Measured Distribution')
plt.legend()    

Do I need to model the distributions of the parameters of dist1, dist2, dist3 in order to correctly specify the priors?

Have you checked this notebook?

Regarding your prior and posterior samples looking similar, you have very little data, so that may be what’s expected

I haven’t seen this, very useful! The article refers to pymc4 but pymc3 also has the NormalMixture method. I tried is using pymc3 and it’s very slow with the sampling. Is NormalMixture optimized in pymc4 to run faster, by chance?

The pymc3 version of that notebooks is here. Sampling categorical models like these tend to be difficult due to the categorical/discrete nature of some of the parameters. However, as you can see by comparing those 2 notebooks, there have been changes in v4. Whether they help speed-wise, I’m not sure. But it may be worth trying both.

I see, the pymc4 version is considerably simpler. The generated data clearly has 3 modes, but it appears that the sigma prior has the same parametes (halfnormal distr, sigma=1).

So I think the idea here to just use a very weak prior? In my case, since my data is scarce, I would want to specify differing sigmas for each of the prior distributions in the mixture. How would I do that?

You may be able to just pass in a vector of sigma values when defining \sigma. Not 100% of that, but you could sample from the prior to check.

So I tried the following:

data = np.array([1.73, 1.59, 1.64, 1.58])

with pm.Model(coords={"cluster": range(k)}) as model_mix:
    μ = pm.Normal(
        "μ",
        mu=0,
        sigma=.01,
        transform=pm.distributions.transforms.ordered,
        initval=[1.72, 1.78, 1.88],
        dims="cluster",
    )
    σ = pm.HalfNormal("σ", sigma=np.array([.025, .025, 0.01]), dims="cluster")
    weights = pm.Dirichlet("w", np.ones(k), dims="cluster")
    pred = pm.NormalMixture("x", w=weights, mu=μ, sigma=σ, observed=data)

    trace_mix = pm.sample()

with model_mix:
    trace_mix.extend(pm.sample_prior_predictive())
    trace_mix.extend(pm.sample_posterior_predictive(trace_mix))

plt.hist(trace_mix['prior_predictive']['x'].to_numpy().reshape(-1))

However, sampling from the prior predictive as above just gives a normal-like distribution with a mean of 0. So something isn’t right.

Curious what the difference is between the formulation above using NormalMixture versus the example from here.

I would probably have to specify 2 priors for each of the distributions under this formulation (so 6 priors in total):

# 2-Mixture Poisson using iterable of distributions.
with pm.Model() as model:
    lam1 = pm.Exponential('lam1', lam=1)
    lam2 = pm.Exponential('lam2', lam=1)

    pois1 = pm.Poisson.dist(mu=lam1)
    pois2 = pm.Poisson.dist(mu=lam2)

    w = pm.Dirichlet('w', a=np.array([1, 1]))

    like = pm.Mixture('like', w=w, comp_dists = [pois1, pois2], observed=data)

Given that the prior on the means of all three components are zero, isn’t this what one would expect?

So using the formulation from the first link with an array of 3 different sigmas worked. How do I think about the 2 different ways of specifying the mixture models - the pm.NormalMixture versus pm.Mixture. Are they alternate ways of specifying the same thing (when I specify a normal distrs with pm.Mixture) or is there a fundamental difference between them?

My understanding is that NormalMixture and MixtureSameFamily are convenience classes to simplify common use-cases, though there could be slight tweaks available in these special cases.

pm.NormalMixture simply uses pm.Mixture under the hood