Gaussian mixture - Infer probability of latent variable on new data

lorenzgu · January 29, 2021, 7:21pm

Hello,
I’m trying to compute the (posterior) probability of the latent variable of a Gaussian mixture model over a new sample, in order to get a “soft clustering” for new samples.

I generate fake data with:

ndata = 500
k = 3  # number of mixture components
mus = np.array([-11, 0, 7])
zs = np.random.randint(0, k, ndata)
y_obs = mus[zs] + np.random.normal(size=ndata)

download1

Following the example notebook I define the model:

with pm.Model() as model:
    p = pm.Dirichlet("p", a=np.ones(k), shape=k)
    z = pm.Categorical("z", p=p, shape=ndata)

    μ = pm.Normal("μ", mu=[-1, 0, 1], sigma=15, shape=k)
    
    # impose μ[0] < μ[1] < μ[2]
    order_means_potential = pm.Potential(
        "order_means_potential",
        tt.switch(μ[0] > μ[1], -np.inf, 0) + tt.switch(μ[1] > μ[2], -np.inf, 0)
    )

    σ = pm.Uniform("σ", lower=0, upper=20)
    y = pm.Normal("y", mu=μ[z], sigma=σ, observed=y_obs)

Then I sample from the posterior and from the posterior predictive:

with model:
    trace = pm.sample(1000)
    posterior_pred_trace = pm.sample_posterior_predictive(trace)

I compare the posterior predictive with the observations:

plt.hist(y_obs, bins=40, density=True)
plt.hist(posterior_pred_trace['y'].reshape(-1), bins=40, density=True);

download

Now I would like to compute the probability of z for a new y_\mathrm{new}, in other words p(z = k| y_\mathrm{new}, \mathcal Y_\mathrm{obs}). I get that:

p(z=k|y_\mathrm{new},\mathcal Y_\mathrm{obs}) = \frac{p(y_\mathrm{new}, z=k|\mathcal Y_\mathrm{obs})}{\sum_k p(y_\mathrm{new}, z=k|\mathcal Y_\mathrm{obs})} = \frac{p(y_\mathrm{new}|z=k,\mathcal Y_\mathrm{obs})p(z=k|\mathcal Y_\mathrm{obs})}{\sum_k p(y_\mathrm{new}|z=k,\mathcal Y_\mathrm{obs})p(z=k|\mathcal Y_\mathrm{obs})}

but it is not clear to me how to proceed and estimate this value. Can this be estimated using the already computed posterior trace, or must I estimate the density (MAP or VI) first? Thank you!

junpenglao · February 1, 2021, 9:54am

Yes you will need to use the trace to do that, something along the line of:

component_logp = pm.Normal.dist(trace['μ'], trace['σ']).logp(y_new)

and then normalized the component_logp

Also, you should use a NormalMixture instead of a latent variable the notebook did (which the notebook should be updated)

ricardoV94 · February 1, 2021, 11:05am

There is actually already a Marginalized gaussian mixture notebook: Marginalized Gaussian Mixture Model — PyMC3 3.10.0 documentation

junpenglao · February 1, 2021, 12:39pm

We should move the non-marginalized one into the marginalized version, and explain why this is a bad idea when you try to apply to more complicated problems.

ricardoV94 · February 1, 2021, 12:58pm

I agree, I had already opened an issue for this on pymc-examples: Combine Marginalized and Latent Gaussian Mixture Notebooks? · Issue #32 · pymc-devs/pymc-examples · GitHub

lorenzgu · February 3, 2021, 10:03pm

@junpenglao many thanks for answering. I’m trying to understand formally your argument, I’m not sure if my computations are rigorous though. If you or anyone has time and will to read through I would be really grateful

Denoting \theta = (\vec p, \vec \mu, \sigma) the parameters and y the observed data, so that (discarding the z's) we are sampling \{\theta^{(s)}\} \sim p(\theta|\mathcal y). Denote \widetilde y the new observed variable and \widetilde z the corresponding new latent variable we want to infer, then

\begin{align*}p(\widetilde z| \widetilde y, y) &= \int p(\widetilde z, \theta|\widetilde y, y) d\theta\\ &= \int p(\widetilde z|\widetilde y, y\!\!\!/, \theta) p(\theta|\widetilde y\!\!\!/, y) d\theta \\& \approx \frac{1}{S} \sum_{s=1}^S p(\widetilde z| \widetilde y, \theta^{(s)})\end{align*}

and

\begin{align*}p(\widetilde z| \widetilde y, \theta^{(s)})=\frac{p(\widetilde y|\widetilde z, \theta^{(s)})p(\widetilde z| \theta^{(s)})}{\sum_{\widetilde z}p(\widetilde y|\widetilde z, \theta^{(s)})p(\widetilde z| \theta^{(s)})}\end{align*}

For the Gaussian mixture this last formula should become:

\begin{align*}p(\widetilde z| \widetilde y, \vec p^{(s)}, \vec \mu^{(s)}, \sigma^{(s)})&=\frac{p(\widetilde y|\widetilde z, \vec \mu^{(s)}, \sigma^{(s)})p(\widetilde z| \vec p^{(s)})}{\sum_{\widetilde z}p(\widetilde y|\widetilde z, \vec \mu^{(s)}, \sigma^{(s)})p(\widetilde z| \vec p^{(s)})}\\ &=\frac{N(\widetilde y|\mu^{(s)}_{\widetilde z}, \sigma^{(s)})\cdot p^{(s)}_{\widetilde z}}{\sum_{\widetilde z}N(\widetilde y|\mu^{(s)}_{\widetilde z}, \sigma^{(s)})\cdot p^{(s)}_{\widetilde z}}\end{align*}

Which I can compute as you suggest, however here it looks like I should multiply the normal probability by trace['p'] component-wise before normalizing, is this correct?

Do you know if this argument is rigorous, or if it can be found in any reference?
Thank you

ricardoV94 · February 4, 2021, 6:26am

Yes you should multiply by the p, the posterior mixture weights (or add the log).

lorenzgu · February 7, 2021, 11:09am

Thank you @ricardoV94, here is what I get. I have generated new data giving different probabilities over the classes:

k = 3
ndata = 1000

def generate_data():
    centers = np.array([-11, 1, 9])
    probs = np.array([.3, .1, .6])
    zs = np.random.choice(np.arange(k), ndata, p=probs)
    y_obs = centers[zs] + np.random.normal(size=ndata, scale=2)
    np.random.shuffle(y_obs)
    return y_obs

y_obs = generate_data()

First I build a Theano function:

tμ, tσ, tp = tt.vector('tμ'), tt.scalar('tσ'), tt.vector('tp')
ty_new = tt.scalar('ty_new')

pz_unnorm = tt.exp(pm.Normal.dist(mu=tμ, sigma=tσ).logp(ty_new) + tt.log(tp))
pz = pz_unnorm / tt.sum(pz_unnorm)

pz_fn = theano.function([ty_new, tμ, tσ, tp], pz)

then I compute the “posterior predictive conditional probabilities” (don’t know the proper name) of z given y_new over a range of y_new's:

import pandas as pd
zs = pd.DataFrame(index=np.linspace(-20, 20, 100), columns=range(k))

for y_new in zs.index:
    v = np.empty((len(trace), k))
    for s in range(len(trace)):
         v[s] = pz_fn(y_new, trace['μ'][s], trace['σ'][s], trace['p'][s])
    zs.loc[y_new] = v.mean(axis=0)

This is the plot of how the probabilities of the zeta’s change over different y’s (out of curiosity I have repeated the computation without the p_\widetilde{z}, it is plotted in gray in the lower plot):

This looks reasonable. Now I’m going to replicate the same with the marginalized mixture as you suggest, thank you for your help.

Topic		Replies	Views
Likelihood of data belonging to mixture component v5 modeling	3	235	November 7, 2023
Get probability of parameter given new data Questions	8	1947	January 18, 2019
Multivariate Gaussian Process Classification (GPC) and Prediction Questions	2	898	February 23, 2020
Where did the indicator go in the customed likelihood? Questions	1	377	December 9, 2019
Marginalizing out a categorical variable Questions	18	2199	April 1, 2021

Gaussian mixture - Infer probability of latent variable on new data

Related topics