Problem with NormalMixture

I have a dataset that has a bimodal distribution, and I want to understand what influences the chance of a data point being in one mode or the other. Here’s my model:

# classes = 2, predictors.shape = (129,9) and data.shape = (129,)
with pm.Model(coords={"classes": range(classes), "predictors": predictors.columns}) as class_model:
    # Priors
    sigma = pm.HalfNormal("sigma", sigma=1)
    centers = pm.Normal(
        "centers",
        mu=[data_mean, data_mean],
        sigma=10,
        transform=pm.distributions.transforms.univariate_ordered,
        initval=[1, 2],
        dims="classes"
    )

    intercept = pm.Normal('intercept', mu=0, sigma=20)
    beta = pm.Normal("beta", mu=0, sigma=1, dims="predictors")
    theta = intercept + pm.math.dot(predictors, beta)
    p = pm.Deterministic("p", pm.math.invlogit(theta))
    w = [p, 1-p]

    # Likelihood
    y = pm.NormalMixture("obs", w=w, mu=centers, sigma=sigma, observed=data)

When I run this model, I get this error:
ValueError: Incompatible Elemwise input shapes [(2, 129), (129, 2)]
I’m guessing this happens when making the w array. There are 129 entries in my dataset, but I don’t know where the second dimension is coming from, or why they’re flipped.

I wrote a simpler model without any predictors (just sampling p from a normal distribution) and it worked. I think there’s something I’m missing about the shapes here. Any help would be appreciated.

It may help if you can share a reproducible snippet. It’s unclear what is the shape of your data and dims.

The data is sensitive, so I can’t post a snippet. The shape of the predictors and observed data is (129, 9) and (129, ) respectively (there are nine predictor columns and one observed column). I’ll write some fake data if the shapes aren’t enough. There are 2 classes.

Here’s a reproducible example of what I’m seeing. This version only uses two predictors, but has the same number of observations.

import pymc as pm
import numpy as np
import pandas as pd

# Create fake data
mus = np.array([4, 14])
sigmas = np.array([2, 3])

predictors = pd.DataFrame({
    "male": pm.draw(pm.Categorical.dist(p=[0.5, 0.5]), draws=129),
    "age": pm.draw(pm.Normal.dist(mu=35, sigma=3), draws=129) 
})

p = pm.draw(pm.Normal.dist(mu=1 + predictors["age"] * 0.25 + predictors["male"] * 1.0))
p = p / (np.max(p) + 0.1)

index = np.array(pm.draw(pm.Categorical.dist(p=np.column_stack((p, 1-p))), draws=1))
data = pm.draw(pm.Normal.dist(mu=mus[index], sigma=2), draws=1)

# Model
classes = 2
with pm.Model(coords={"classes": range(classes), "predictors": predictors.columns}) as model:
    # Priors
    sigma = pm.HalfNormal("sigma", sigma=1)
    centers = pm.Normal(
        "centers",
        mu=[data.mean(), data.mean()],
        sigma=10,
        transform=pm.distributions.transforms.univariate_ordered,
        initval=[1, 2],
        dims="classes"
    )

    intercept = pm.Normal('intercept', mu=0, sigma=20)
    beta = pm.Normal("beta", mu=0, sigma=1, dims="predictors")
    theta = intercept + pm.math.dot(predictors, beta)
    p = pm.Deterministic("p", pm.math.invlogit(theta))
    w = [p, 1-p]

    # Likelihood
    y = pm.NormalMixture("obs", w=w, mu=centers, sigma=sigma, observed=data)

# Train
with model:
    idata = pm.sample()

I figured it out. Changing the w term to this fixes the issue:
w = pm.math.stack([p, 1-p], axis=1)

Thanks!

5 Likes

Thanks for posting the solution!