# Gaussian Mixture of regression

df.csv (94.7 KB)
Hello,

I’m just curious to know why I’m getting two different results in the Mixture of regression. I tried to standardize the data but still obtained different results from the two model setups. Below are codes used in the analysis. I have also attached the data.

Model 1:

``````k = 2

with pm.Model() as Mixture_regression1:

# Priors for weight parameter
π1 = pm.Uniform('π1', 0, 1)
π2 = pm.Deterministic('π2', 1. - π1)

# Priors for unknown model parameters
α1 = pm.Normal('α1', mu=0, sd=100) #Intercept
β1 = pm.Normal('β1', mu=0, sd=100)
σ1  = pm.HalfCauchy('σ1', 5)  #Noise

α2 = pm.Normal('α2', mu=0, sd=100) #Intercept
β2 = pm.Normal('β2', mu=0, sd=100)
σ2  = pm.HalfCauchy('σ2', 5)  #Noise

μ1 = α1 + β1*X
μ2 = α2 + β2*X

mu = T.stack([μ1, μ2]).T
sd = T.stack([σ1, σ2])
π = T.stack([π1, π2])

likelihood = pm.NormalMixture('likelihood', π, mu, sd=sd, observed=Y)
``````

Model 1 results:

Model 2:

``````n =  X.shape[0]
with pm.Model() as Mixture_regression2:

# Proportion in each mixture
π = pm.Dirichlet('π', np.array([1]*k), testval=np.ones(k)/k)

# Priors for unknown model parameters
α = pm.Normal('α', mu=Y.mean(), sd=100,shape=k) #Intercept
β = pm.Normal('β', mu=0, sd=100, shape=k)
σ  = pm.HalfCauchy('σ', 5,shape=k)  #Noise

# Classifying each observation
category = pm.Categorical('category', p=π, shape=n)

α1 = pm.Deterministic('α1', α[category])
β1 = pm.Deterministic('β1', β[category])
σ1 = pm.Deterministic('σ1', σ[category])

μ = α1 + β1*X

# Likelihood
likelihood = pm.Normal('likelihood', mu=μ, sd=σ1, observed=Y)

step1 = pm.Metropolis([π, α, β, σ])
step2 = pm.ElemwiseCategorical([category], values=range(k))
trace = pm.sample(niter, [step1, step2], random_seed=1345, progressbar=True)
``````

Model 2 results:

Thank you for the help!

Did you check the trace? I am getting the same result from Model 1 (also much faster with NUTS):

This is the code I am using (small rewrite so it is more compact):

``````df = pd.read_csv('mixture.csv')
X = df['X'].values[:, None]
Y = df['Y'].values
k = 2
with pm.Model() as Mixture_regression1:
# Priors for weight parameter
π = pm.Dirichlet('π', np.array([1]*k), testval=np.ones(k)/k)

# Priors for unknown model parameters
α = pm.Normal('α', mu=0, sd=100, shape=(1, k)) #Intercept
β = pm.Normal('β', mu=0, sd=100, shape=(1, k))
σ  = pm.HalfCauchy('σ', 5, shape=k)  #Noise

mu = α + β*X

likelihood = pm.NormalMixture('likelihood', π, mu, sd=σ, observed=Y)
trace = pm.sample(1000, tune=1000)
``````

OK I think I know why this happens: there are label switching in the mixture, for example, in the first trace pi=[.05, .95] but in the second trace pi=[.95, .05].

You can use an ordered transform to restrict the order of \$alpha\$: Mixture models and breaking class symmetry

[EDIT]:
After some more thought and playing with the model, I think your problem is not related to label switching (plus, you are running one chain only, right?). In fact when I reran the model I am getting the output as the model 1 in your case quite consistently. So what takes? My intuition is that there are multimode in the posterior, one around pi = [.05, .95] and one around [.35, .65]. A more informative prior should help with the model identification issue.

I put up a Gist notebook, which the two way of modelling the mixture regression converge to a similar result http://nbviewer.jupyter.org/gist/junpenglao/1907bf019906c125f63126ec4bf23880/Mixture_discourse.ipynb. It is important to run multiple chains and check the fitting.
It is also quite interesting in itself as it clearly shows that although the marginalizing and the non-marginalizing gives the same posterior, the effective sample size is much better with marginalizing.

1 Like

Thank you for your help! That right, I was using one chain in both setups. The descriptions you provided are real helpful!

One more issue that I have is when using pm.Mixture I’m getting an error. I guess the way pm.normal.dist I defined is not right for the regression:

The correct way to do in this case is `likelihood = pm.Mixture('likelihood', π, pm.Normal.dist(mu, sd), observed=Y)`

It is working. Thank you very much.