Hi all,
I’m trying to fit a hierarchical regression model where the slopes follow a bimodal distribution (a mixture of two normals) rather than being normally distributed around a group mean. Essentially each beta value should come from one of two normal distributions, but I have no prior knowledge of which distribution it comes from.
I’ve tried implementing this using a NormalMixture
as the group beta and estimating the mean of each of its two component distributions. However the component distributions tend to just collapse to the overall mean rather than giving the true bimodal distribution.
This may be a dumb way to approach this, or simply impossible, but I’d appreciate any help!
Here is a minimal example:
import pandas as pd
import pymc3 as pm
df = pd.read_csv('example_data.csv')
with pm.Model() as hierarchical_model:
# Means of component distributions
mus = pm.Normal('mixture_mus', mu=0.5, sd=0.5, shape=2)
# Group-level beta
beta_mu = pm.NormalMixture('beta_mu', [0.5, 0.5], mu=mus, sd=0.5)
# Betas
beta = pm.Normal("beta", mu=beta_mu, sd=1, shape=len(df['level2'].unique()))
# Error
eps = pm.HalfNormal('eps', 1)
# Estimated y values
y_est = beta[df['level2'].values] * df['x']
# Likelihood
y_like = pm.Normal('likelihood', y_est, sd=eps, observed=df['y'])
with hierarchical_model:
trace = pm.sample(3000, chains=1, tune=500)
And for reference, the true beta values used to simulate data (which I’d expect the group beta to look like):
example_data.csv (633.6 KB)