Gaussian Mixture of regression

df.csv (94.7 KB)
Hello,

I’m just curious to know why I’m getting two different results in the Mixture of regression. I tried to standardize the data but still obtained different results from the two model setups. Below are codes used in the analysis. I have also attached the data.

Model 1:

k = 2 

with pm.Model() as Mixture_regression1:

    # Priors for weight parameter
    π1 = pm.Uniform('π1', 0, 1)
    π2 = pm.Deterministic('π2', 1. - π1)
    
    # Priors for unknown model parameters
    α1 = pm.Normal('α1', mu=0, sd=100) #Intercept
    β1 = pm.Normal('β1', mu=0, sd=100) 
    σ1  = pm.HalfCauchy('σ1', 5)  #Noise
    
    α2 = pm.Normal('α2', mu=0, sd=100) #Intercept
    β2 = pm.Normal('β2', mu=0, sd=100) 
    σ2  = pm.HalfCauchy('σ2', 5)  #Noise
     
    μ1 = α1 + β1*X 
    μ2 = α2 + β2*X 

    mu = T.stack([μ1, μ2]).T
    sd = T.stack([σ1, σ2])
    π = T.stack([π1, π2])
        
    likelihood = pm.NormalMixture('likelihood', π, mu, sd=sd, observed=Y)

Model 1 results:
image

Model 2:

n =  X.shape[0]                                         
with pm.Model() as Mixture_regression2:
    
       # Proportion in each mixture
       π = pm.Dirichlet('π', np.array([1]*k), testval=np.ones(k)/k)
    
       # Priors for unknown model parameters
       α = pm.Normal('α', mu=Y.mean(), sd=100,shape=k) #Intercept
       β = pm.Normal('β', mu=0, sd=100, shape=k) 
       σ  = pm.HalfCauchy('σ', 5,shape=k)  #Noise
    
       # Classifying each observation
       category = pm.Categorical('category', p=π, shape=n)

       α1 = pm.Deterministic('α1', α[category])
       β1 = pm.Deterministic('β1', β[category]) 
       σ1 = pm.Deterministic('σ1', σ[category])
    
       μ = α1 + β1*X 

       # Likelihood 
       likelihood = pm.Normal('likelihood', mu=μ, sd=σ1, observed=Y)
    
       step1 = pm.Metropolis([π, α, β, σ]) 
       step2 = pm.ElemwiseCategorical([category], values=range(k))
       trace = pm.sample(niter, [step1, step2], random_seed=1345, progressbar=True)

Model 2 results:
image

Thank you for the help!

Did you check the trace? I am getting the same result from Model 1 (also much faster with NUTS):

This is the code I am using (small rewrite so it is more compact):

df = pd.read_csv('mixture.csv')
X = df['X'].values[:, None]
Y = df['Y'].values
k = 2 
with pm.Model() as Mixture_regression1:
    # Priors for weight parameter
    π = pm.Dirichlet('π', np.array([1]*k), testval=np.ones(k)/k)
    
    # Priors for unknown model parameters
    α = pm.Normal('α', mu=0, sd=100, shape=(1, k)) #Intercept
    β = pm.Normal('β', mu=0, sd=100, shape=(1, k))
    σ  = pm.HalfCauchy('σ', 5, shape=k)  #Noise
    
    mu = α + β*X
        
    likelihood = pm.NormalMixture('likelihood', π, mu, sd=σ, observed=Y)
    trace = pm.sample(1000, tune=1000)

OK I think I know why this happens: there are label switching in the mixture, for example, in the first trace pi=[.05, .95] but in the second trace pi=[.95, .05].

You can use an ordered transform to restrict the order of $alpha$: Mixture models and breaking class symmetry

[EDIT]:
After some more thought and playing with the model, I think your problem is not related to label switching (plus, you are running one chain only, right?). In fact when I reran the model I am getting the output as the model 1 in your case quite consistently. So what takes? My intuition is that there are multimode in the posterior, one around pi = [.05, .95] and one around [.35, .65]. A more informative prior should help with the model identification issue.

I put up a Gist notebook, which the two way of modelling the mixture regression converge to a similar result http://nbviewer.jupyter.org/gist/junpenglao/1907bf019906c125f63126ec4bf23880/Mixture_discourse.ipynb. It is important to run multiple chains and check the fitting.
It is also quite interesting in itself as it clearly shows that although the marginalizing and the non-marginalizing gives the same posterior, the effective sample size is much better with marginalizing.

1 Like

Thank you for your help! That right, I was using one chain in both setups. The descriptions you provided are real helpful!

One more issue that I have is when using pm.Mixture I’m getting an error. I guess the way pm.normal.dist I defined is not right for the regression:

The correct way to do in this case is likelihood = pm.Mixture('likelihood', π, pm.Normal.dist(mu, sd), observed=Y)

It is working. Thank you very much.