# Error when trying mixture distribution for likelihood (updated)

Hi all,

This is an updated version of a question I asked earlier (here). I’m trying to simplify it so I wanted to start anew.

I’m using PyMC3 to do a Bayesian ANOVA and would like to move from normal distributions in my likelihood to mixture ones that better match the data.

I have 5 different groups to compare in my data, but I’m reducing it to 2 here for simplicity’s sake. Here’s some sample code that works fine using a gaussian distribution for the likelihood:

``````import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm
import arviz as az

#####################
# Create a sample dataset
#####################

# Create some data for two groups
ages_group01 = pd.Series(np.round(np.random.normal(scale=5, size=500)) + 55)
ages_group02 = pd.Series(np.round(np.random.normal(scale=5, size=500)) + 35)

# Organize it into a data frame
sampleDF = pd.DataFrame(0, columns=['age', 'group'], index=range(0, 1000))
sampleDF.iloc[0:500, 0] = ages_group01
sampleDF.iloc[0:500, 1] = 1
sampleDF.iloc[500:1000, 0] = ages_group02
sampleDF.iloc[500:1000, 1] = 2

# Add dummy/indicator variables into the data frame
dummyVars = pd.get_dummies(sampleDF['group'])
dummyVars.columns = ['group01', 'group02']
sampleDF = sampleDF.join(dummyVars)

#####################
# Bayesian ANOVA
#####################

DV = 'age'

with pm.Model() as model:

# Define priors
sigma = pm.HalfCauchy('sigma', beta=10)
group01 = pm.Normal('group01', mu=40, sd=10)
group02 = pm.Normal('group02', mu=40, sd=10)

likelihood = pm.Normal(DV,
mu=group01 * sampleDF['group01'] + group02 * sampleDF['group02'],
sd=sigma,
observed=sampleDF[DV])

trace = pm.sample(draws=1000, tune=1000, chains=2, target_accept=0.95, cores=8, return_inferencedata=True)
``````

However, when I change the PyMC model to the following:

``````#####################
# Bayesian ANOVA
#####################

DV = 'age'

with pm.Model() as model:

# Define priors
sigma = pm.HalfCauchy('sigma', beta=10)
group01 = pm.Normal('group01', mu=40, sd=10)
group02 = pm.Normal('group02', mu=40, sd=10)

# Create two distributions
w = pm.Dirichlet('w', a=np.array([0.3, 1]))
dist1 = pm.HalfNormal.dist(sigma=0.8)
dist2 = pm.Normal.dist(mu=group01 * sampleDF['group01'] + group02 * sampleDF['group02'], sigma=sigma)

# Define likelihood using a misture of the two distributions instead of the gaussian
likelihood = pm.Mixture(DV,
w=w,
comp_dists=[dist1, dist2],
observed=sampleDF[DV])

trace = pm.sample(draws=1000, tune=1000, chains=2, target_accept=0.95, cores=8, return_inferencedata=True)
``````

I end up getting the following error:

I can, however, use similar code to produce the mixture distribution that I’m looking for:

``````w = pm.Dirichlet('w', a=np.array([0.3, 1]))
dist1 = pm.HalfNormal.dist(sigma=0.8)
dist2 = pm.Normal.dist(mu=10, sigma=1.5)
customDist = pm.Mixture('customDist', w=w, comp_dists = [dist1, dist2])
``````

That makes a distribution something like the following:

I'm assuming the error stems from how the component distributions are being specified in the second model, but I'm looking for help on how to fix it. Any suggestions?

I just tested your snippet in PyMC V5 and it worked fine. So a simple solution is to upgrade. The Mixture class has been improved quite a lot since PyMC3, specially concerning shape issues.

Only changes needed were:

``````import pymc as pm
``````

And the `sd` arguments used in your first model are now called `sigma`.