Gaussian Mixture Model + pm.Deterministic "dims"

Hi everyone,

I am trying to implement a Gaussian Mixture Model to describe a multimodal dataset. Unlike what is implemented here, where μ follows a Normal distribution, I want to implement a linear model.

----------------------------------------------------------------------------------------------------------------------------------------
Code:
Generate synthetic data:

RANDOM_SEED = 8927
rng = np.random.default_rng(RANDOM_SEED)
az.style.use("arviz-darkgrid")
def standardize(series):
    """Standardize a pandas series"""
    return (series - series.mean()) / series.std()

N, k = 500, 3
centers = np.array([-5, 0, 5])
sds = np.array([0.40, 2.0, 0.20])
idx = rng.integers(0, k, N)
true_a, true_b, predictor = 0.5, 3.0, rng.normal(loc=centers[idx], scale=sds[idx], size=N)
true_mu = true_a + true_b * predictor

outcome = rng.normal(loc=true_mu, scale=sds[idx], size=N)
predictor_scaled = standardize(predictor)
outcome_scaled = standardize(outcome)

plt.hist(outcome_scaled)
plt.show()

----------------------------------------------------------------------------------------------------------------------------------------
Model:

with pm.Model(coords={"cluster": range(k)}) as model:
    a = pm.Normal("a", 0.0, 10.0)
    b = pm.Normal("b", 0.0, 10.0)
        
    μ = pm.Deterministic("μ", var=a + b * predictor_scaled, dims="cluster")
    
    σ = pm.HalfNormal("σ", sigma=1, dims="cluster")
    weights = pm.Dirichlet("w", np.ones(k), dims="cluster")    
    
    pm.NormalMixture("x", w=weights, mu=μ, sigma=σ, observed=outcome_scaled)
    idata = pm.sample()

---------------------------------------------------------------------------------------------------------------------------------------
Problem:

----------------------------------------------------------------------------------------------------------------------------------------
How can I change pm.Deterministic() to accept the predictor_scaled variable and the “dims” parameter without producing incompatibility of dimensions?

Deterministic does not modify the shape of value - it is only to mark that you want to track something in the MCMC output.
Instead you should expand the shape in the input into NormalMixture, something like μ[..., None]:

with pm.Model(coords={"cluster": range(k)}) as model:
    a = pm.Normal("a", 0.0, 10.0)
    b = pm.Normal("b", 0.0, 10.0)

    μ = pm.Deterministic("μ", var=a + b * predictor)
    
    σ = pm.HalfNormal("σ", sigma=1, dims="cluster")
    weights = pm.Dirichlet("w", np.ones(k), dims="cluster")    
    
    pm.NormalMixture("x", w=weights, mu=μ[..., None], sigma=σ, observed=outcome)
    idata = pm.sample()

This now sample but you will likely see mode switching during sampling, adding a transformation should help inference result:

with pm.Model(coords={"cluster": range(k)}) as model:
    a = pm.Normal("a", 0.0, 10.0)
    b = pm.Normal("b", 0.0, 10.0)

    μ = pm.Deterministic("μ", var=a + b * predictor)
    
    σ = pm.HalfNormal("σ", sigma=1, dims="cluster",
                      transform=transforms.Chain([
                          transforms.LogTransform(),
                          transforms.Ordered(0),
                      ]),
                      initval=np.asarray([1., 2., 3.])
                     )
    weights = pm.Dirichlet("w", np.ones(k), dims="cluster")    
    
    pm.NormalMixture("x", w=weights, mu=μ[..., None], sigma=σ, observed=outcome)
    idata = pm.sample()
1 Like

Thank you very much for the quick reply!

Regarding the “transform” variable of the pm.HalfNormal() I am not being able to use it, because pm.distributions.transforms yields “NameError: name ‘transforms’ is not defined”. I am running on PyMC v4.4.0.

However, when using the first option the idata.posterior yields what I show below:

Given the fact that μ_dim_0=len(outcome) (the length of both the predictors and the observations, i.e. the dataset) I can’t then plot the same way as in the example:

xi = np.linspace(-7, 7, 500)
post = idata.posterior
pdf_components = XrContinuousRV(norm, post["μ"], post["σ"]).pdf(xi) * post["w"]
pdf = pdf_components.sum("cluster")

fig, ax = plt.subplots(3, 1, figsize=(7, 8), sharex=True)

####### empirical histogram
ax[0].hist(outcome_scaled, 50)
ax[0].set(title="Data", xlabel="x", ylabel="Frequency")

####### pdf
pdf_components.mean(dim=["chain", "draw"]).sum("cluster").plot.line(ax=ax[1])
ax[1].set(title="PDF", xlabel="x", ylabel="Probability\ndensity")

####### plot group membership probabilities
(pdf_components / pdf).mean(dim=["chain", "draw"]).plot.line(hue="cluster", ax=ax[2])
ax[2].set(title="Group membership", xlabel="x", ylabel="Probability")

When trying to plot as above I get the following error:

@Juan_Pablo The transforms are now in the longer path pymc.distributions.transforms

Excellent, thanks! Now it’s working.

I still don’t understand how to handle the difference in the shape of the posterior of “μ” (chain,draw,μ_dim_0) and of “σ” (chain,draw,cluster) that I mentioned above; which is blocking me from plotting.

Basically, how can I transform the shape of “μ” (chain,draw,μ_dim_0) to (chain,draw,cluster)?

You can pass dims to your Deterministic, what @junpenglao meant was that they are just a label, they don’t affect the number of dimensions, you have to do that youseslf (e.g., with the [..., None]).

You can use dims to label them in the final trace.

If it’s not clear can you share your latest model code?

Okay, I follow. My main problem is not regarding the label but the difference in shape.


with pm.Model(coords={"cluster": range(k)}) as model:
    a = pm.Normal("a", 0.0, 10.0)
    b = pm.Normal("b", 0.0, 10.0)

    μ = pm.Deterministic("μ", var=a + b * predictor_scaled)
    
    σ = pm.HalfNormal("σ", sigma=1, dims="cluster",
                      transform=transforms.Chain([
                          transforms.LogTransform(),
                          transforms.Ordered(),
                      ]),
                      initval=np.asarray([1., 2., 3.])
                     )
    weights = pm.Dirichlet("w", np.ones(k), dims="cluster")    
    
    pm.NormalMixture("x", w=weights, mu=μ[..., None], sigma=σ, observed=outcome_scaled)
    idata = pm.sample()

If you display the idata object you´ll see that:
shape(σ) = (chain, draw, cluster) = (4, 1000, 3)
shape(μ) = (chain, draw, μ_dim_0) = (4, 1000, 500).

But if you look at the example of Gaussian Mixture Model (mentioned earlier in this thread), where μ is described with a Normal distribution (not a linear model) then shape(μ) = (chain, draw, cluster) = (4, 1000, 3).

How could I get the same kind of result, but instead of modeling μ with a pm.Normal(…, dims=“cluster”), model it as pm.Deterministic(…, var=a+bpredictor,…)?*

I dont think it is possible from the set up of your model (and data generation process), because μ was specified to be the same for each mixture component. After broadcasting, you actually have:

mixture_shape = (num_batch, num_cluster) = (500, 3),

which means:

shape(σ) = (1, 3)
shape(μ) = (500, 1)

Okay, thank you for your answer.

I changed part of the setting of the model. Now, I generate the predictor inside the model, so that μ will still be generated by the linear model, but the predictor will follow a normal distribution N(-5,0.40) for the first mixture component, N(0, 2.0) for the second one, and the corresponding for the third one.

with pm.Model(coords={"cluster": range(k)}) as model:
    a = pm.Normal("a", 0.0, 10.0)
    b = pm.Normal("b", 0.0, 10.0)
    
    predictor = pm.Normal('predictor', mu=np.array([-5,0,5]), sigma=np.array([0.40,2.0,0.20]), dims="cluster")
    μ = pm.Deterministic("μ", var=a + b * predictor, dims="cluster")
   
    σ = pm.HalfNormal("σ", sigma=1, dims="cluster",
                      transform=transforms.Chain([
                          transforms.LogTransform(),
                          transforms.Ordered(),
                      ]),
                      initval=np.asarray([1., 2., 3.])
                     )
    weights = pm.Dirichlet("w", np.ones(k), dims="cluster")    
    
   
    pm.NormalMixture("x", w=weights, mu=μ, sigma=σ, observed=outcome_scaled)
    idata = pm.sample()

What it does not convince me is that I want the predictor to be normalized as the observations, but preserving the presence of mixture components (i.e. not using predictor=pm.Normal(…, mu=0, sigma=1, …)). Below I show how the predictor_scaled looks like when generating the data outside the model:

RANDOM_SEED = 8927
rng = np.random.default_rng(RANDOM_SEED)
az.style.use("arviz-darkgrid")
def standardize(series):
    """Standardize a pandas series"""
    return (series - series.mean()) / series.std()

N, k = 500, 3
centers = np.array([-5, 0, 5])
sds = np.array([0.40, 2.0, 0.20])
idx = rng.integers(0, k, N)
true_a, true_b, predictorr = 0.5, 3.0, rng.normal(loc=centers[idx], scale=sds[idx], size=N)
true_mu = true_a + true_b * predictorr

outcome = rng.normal(loc=true_mu, scale=sds[idx], size=N)
predictor_scaled = standardize(predictorr)
outcome_scaled = standardize(outcome)

plt.hist(predictor_scaled)
plt.show()

------------------------------------------------------------------------------------------------------------------------
Questions:

  1. Is it possible to normalized the predictor=pm.Normal(…) object (inside the model structure) and preserving the different mixture components?
  2. If not, how could I model a multimodal dataset using a linear model for the μ? I understand you said it is not possible given my set up so, what should I change?

Thanks for your time!

I would use the un-normalized observed and prediction directly. The scaling could be capture by the sigma so I dont really see the need to normalized the observed or predictor.