Gaussian mixture - Infer probability of latent variable on new data

Hello,
I’m trying to compute the (posterior) probability of the latent variable of a Gaussian mixture model over a new sample, in order to get a “soft clustering” for new samples.

I generate fake data with:

ndata = 500
k = 3  # number of mixture components
mus = np.array([-11, 0, 7])
zs = np.random.randint(0, k, ndata)
y_obs = mus[zs] + np.random.normal(size=ndata)

download1

Following the example notebook I define the model:

with pm.Model() as model:
    p = pm.Dirichlet("p", a=np.ones(k), shape=k)
    z = pm.Categorical("z", p=p, shape=ndata)

    μ = pm.Normal("μ", mu=[-1, 0, 1], sigma=15, shape=k)
    
    # impose μ[0] < μ[1] < μ[2]
    order_means_potential = pm.Potential(
        "order_means_potential",
        tt.switch(μ[0] > μ[1], -np.inf, 0) + tt.switch(μ[1] > μ[2], -np.inf, 0)
    )

    σ = pm.Uniform("σ", lower=0, upper=20)
    y = pm.Normal("y", mu=μ[z], sigma=σ, observed=y_obs)

Then I sample from the posterior and from the posterior predictive:

with model:
    trace = pm.sample(1000)
    posterior_pred_trace = pm.sample_posterior_predictive(trace)

I compare the posterior predictive with the observations:

plt.hist(y_obs, bins=40, density=True)
plt.hist(posterior_pred_trace['y'].reshape(-1), bins=40, density=True);

download

Now I would like to compute the probability of z for a new y_\mathrm{new}, in other words p(z = k| y_\mathrm{new}, \mathcal Y_\mathrm{obs}). I get that:

p(z=k|y_\mathrm{new},\mathcal Y_\mathrm{obs}) = \frac{p(y_\mathrm{new}, z=k|\mathcal Y_\mathrm{obs})}{\sum_k p(y_\mathrm{new}, z=k|\mathcal Y_\mathrm{obs})} = \frac{p(y_\mathrm{new}|z=k,\mathcal Y_\mathrm{obs})p(z=k|\mathcal Y_\mathrm{obs})}{\sum_k p(y_\mathrm{new}|z=k,\mathcal Y_\mathrm{obs})p(z=k|\mathcal Y_\mathrm{obs})}

but it is not clear to me how to proceed and estimate this value. Can this be estimated using the already computed posterior trace, or must I estimate the density (MAP or VI) first? Thank you! :slight_smile:

1 Like

Yes you will need to use the trace to do that, something along the line of:

component_logp = pm.Normal.dist(trace['μ'], trace['σ']).logp(y_new)

and then normalized the component_logp

Also, you should use a NormalMixture instead of a latent variable the notebook did (which the notebook should be updated)

There is actually already a Marginalized gaussian mixture notebook: Marginalized Gaussian Mixture Model — PyMC3 3.10.0 documentation

We should move the non-marginalized one into the marginalized version, and explain why this is a bad idea when you try to apply to more complicated problems.

I agree, I had already opened an issue for this on pymc-examples: Combine Marginalized and Latent Gaussian Mixture Notebooks? · Issue #32 · pymc-devs/pymc-examples · GitHub

1 Like

@junpenglao many thanks for answering. I’m trying to understand formally your argument, I’m not sure if my computations are rigorous though. If you or anyone has time and will to read through I would be really grateful :slight_smile:

Denoting \theta = (\vec p, \vec \mu, \sigma) the parameters and y the observed data, so that (discarding the z's) we are sampling \{\theta^{(s)}\} \sim p(\theta|\mathcal y). Denote \widetilde y the new observed variable and \widetilde z the corresponding new latent variable we want to infer, then

\begin{align*}p(\widetilde z| \widetilde y, y) &= \int p(\widetilde z, \theta|\widetilde y, y) d\theta\\ &= \int p(\widetilde z|\widetilde y, y\!\!\!/, \theta) p(\theta|\widetilde y\!\!\!/, y) d\theta \\& \approx \frac{1}{S} \sum_{s=1}^S p(\widetilde z| \widetilde y, \theta^{(s)})\end{align*}

and

\begin{align*}p(\widetilde z| \widetilde y, \theta^{(s)})=\frac{p(\widetilde y|\widetilde z, \theta^{(s)})p(\widetilde z| \theta^{(s)})}{\sum_{\widetilde z}p(\widetilde y|\widetilde z, \theta^{(s)})p(\widetilde z| \theta^{(s)})}\end{align*}

For the Gaussian mixture this last formula should become:

\begin{align*}p(\widetilde z| \widetilde y, \vec p^{(s)}, \vec \mu^{(s)}, \sigma^{(s)})&=\frac{p(\widetilde y|\widetilde z, \vec \mu^{(s)}, \sigma^{(s)})p(\widetilde z| \vec p^{(s)})}{\sum_{\widetilde z}p(\widetilde y|\widetilde z, \vec \mu^{(s)}, \sigma^{(s)})p(\widetilde z| \vec p^{(s)})}\\ &=\frac{N(\widetilde y|\mu^{(s)}_{\widetilde z}, \sigma^{(s)})\cdot p^{(s)}_{\widetilde z}}{\sum_{\widetilde z}N(\widetilde y|\mu^{(s)}_{\widetilde z}, \sigma^{(s)})\cdot p^{(s)}_{\widetilde z}}\end{align*}

Which I can compute as you suggest, however here it looks like I should multiply the normal probability by trace['p'] component-wise before normalizing, is this correct?

Do you know if this argument is rigorous, or if it can be found in any reference?
Thank you

Yes you should multiply by the p, the posterior mixture weights (or add the log).

Thank you @ricardoV94, here is what I get. I have generated new data giving different probabilities over the classes:

k = 3
ndata = 1000

def generate_data():
    centers = np.array([-11, 1, 9])
    probs = np.array([.3, .1, .6])
    zs = np.random.choice(np.arange(k), ndata, p=probs)
    y_obs = centers[zs] + np.random.normal(size=ndata, scale=2)
    np.random.shuffle(y_obs)
    return y_obs

y_obs = generate_data()

First I build a Theano function:

tμ, tσ, tp = tt.vector('tμ'), tt.scalar('tσ'), tt.vector('tp')
ty_new = tt.scalar('ty_new')

pz_unnorm = tt.exp(pm.Normal.dist(mu=tμ, sigma=tσ).logp(ty_new) + tt.log(tp))
pz = pz_unnorm / tt.sum(pz_unnorm)

pz_fn = theano.function([ty_new, tμ, tσ, tp], pz)

then I compute the “posterior predictive conditional probabilities” (don’t know the proper name) of z given y_new over a range of y_new's:

import pandas as pd
zs = pd.DataFrame(index=np.linspace(-20, 20, 100), columns=range(k))

for y_new in zs.index:
    v = np.empty((len(trace), k))
    for s in range(len(trace)):
         v[s] = pz_fn(y_new, trace['μ'][s], trace['σ'][s], trace['p'][s])
    zs.loc[y_new] = v.mean(axis=0)

This is the plot of how the probabilities of the zeta’s change over different y’s (out of curiosity I have repeated the computation without the p_\widetilde{z}, it is plotted in gray in the lower plot):

This looks reasonable. Now I’m going to replicate the same with the marginalized mixture as you suggest, thank you for your help.

1 Like