Error when trying mixture distribution for likelihood (updated)

Hi all,

This is an updated version of a question I asked earlier (here). I’m trying to simplify it so I wanted to start anew.

I’m using PyMC3 to do a Bayesian ANOVA and would like to move from normal distributions in my likelihood to mixture ones that better match the data.

I have 5 different groups to compare in my data, but I’m reducing it to 2 here for simplicity’s sake. Here’s some sample code that works fine using a gaussian distribution for the likelihood:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm
import arviz as az


#####################
# Create a sample dataset
#####################

# Create some data for two groups
ages_group01 = pd.Series(np.round(np.random.normal(scale=5, size=500)) + 55)
ages_group02 = pd.Series(np.round(np.random.normal(scale=5, size=500)) + 35)

# Organize it into a data frame
sampleDF = pd.DataFrame(0, columns=['age', 'group'], index=range(0, 1000))
sampleDF.iloc[0:500, 0] = ages_group01
sampleDF.iloc[0:500, 1] = 1
sampleDF.iloc[500:1000, 0] = ages_group02
sampleDF.iloc[500:1000, 1] = 2

# Add dummy/indicator variables into the data frame
dummyVars = pd.get_dummies(sampleDF['group'])
dummyVars.columns = ['group01', 'group02']
sampleDF = sampleDF.join(dummyVars)


#####################
# Bayesian ANOVA
#####################

DV = 'age'

with pm.Model() as model:

    # Define priors
    sigma = pm.HalfCauchy('sigma', beta=10)
    group01 = pm.Normal('group01', mu=40, sd=10)
    group02 = pm.Normal('group02', mu=40, sd=10)

    likelihood = pm.Normal(DV,
                            mu=group01 * sampleDF['group01'] + group02 * sampleDF['group02'],
                            sd=sigma,
                            observed=sampleDF[DV])

    trace = pm.sample(draws=1000, tune=1000, chains=2, target_accept=0.95, cores=8, return_inferencedata=True)

However, when I change the PyMC model to the following:

#####################
# Bayesian ANOVA
#####################

DV = 'age'

with pm.Model() as model:

    # Define priors
    sigma = pm.HalfCauchy('sigma', beta=10)
    group01 = pm.Normal('group01', mu=40, sd=10)
    group02 = pm.Normal('group02', mu=40, sd=10)

    # Create two distributions
    w = pm.Dirichlet('w', a=np.array([0.3, 1]))
    dist1 = pm.HalfNormal.dist(sigma=0.8)
    dist2 = pm.Normal.dist(mu=group01 * sampleDF['group01'] + group02 * sampleDF['group02'], sigma=sigma)

    # Define likelihood using a misture of the two distributions instead of the gaussian
    likelihood = pm.Mixture(DV,
                            w=w,
                            comp_dists=[dist1, dist2],
                            observed=sampleDF[DV])

    trace = pm.sample(draws=1000, tune=1000, chains=2, target_accept=0.95, cores=8, return_inferencedata=True)

I end up getting the following error:

I can, however, use similar code to produce the mixture distribution that I’m looking for:

w = pm.Dirichlet('w', a=np.array([0.3, 1]))
dist1 = pm.HalfNormal.dist(sigma=0.8)
dist2 = pm.Normal.dist(mu=10, sigma=1.5)
customDist = pm.Mixture('customDist', w=w, comp_dists = [dist1, dist2])

That makes a distribution something like the following:


I'm assuming the error stems from how the component distributions are being specified in the second model, but I'm looking for help on how to fix it. Any suggestions?

I just tested your snippet in PyMC V5 and it worked fine. So a simple solution is to upgrade. The Mixture class has been improved quite a lot since PyMC3, specially concerning shape issues.

Only changes needed were:

import pymc as pm

And the sd arguments used in your first model are now called sigma.