Mixture of Linear models

sachinruk · July 21, 2018, 6:24am

I am trying to build the following model:

47%20pm

With some of the sampled data looking like the following:

N = 100000
x = 2*np.random.rand(N,1) - 1
idx = np.random.rand(N) < 0.5
y = 3*x
y[idx] = -x[idx]
y = y + 0.1*np.random.randn(N,1)

The pymc3 model looks like following and I sample it using minibatches for Variational Bayes (so that avoid/ minimise label switching).

def logp_normal(mu, sigma, value):
    # log probability of individual samples
    delta = (value - mu).ravel()
    k = delta.shape[0]
    return -0.5 * (k * tt.log(2 * np.pi) + tt.log(sigma) + tt.square(delta).sum()/tt.square(sigma))

# Log likelihood of Gaussian mixture distribution
def logp_gmix(mus, pi, sigma):
    def logp_(value):
        logps = [tt.log(pi[i]) + logp_normal(mu, sigma, value) for i, mu in enumerate(mus)]

        return tt.sum(pm.math.logsumexp(tt.stacklists(logps), axis=0))

    return logp_

K = 2 #components
D = 1 #dimensionality of x
X = theano.shared(x)
Y = theano.shared(y.ravel())

with pm.Model() as model:
    pi = pm.Dirichlet('pi', np.ones(K))
    ws = [pm.Normal('w_%d'%i, mu=0, sd=2, shape=(D,)) for i in range(K)]
    sigma  = pm.Uniform('sigma', 0, 1)
    mus = [tt.squeeze(tt.dot(X,w)) for w in ws]
    y_obs = pm.DensityDist('y_obs', logp_gmix(mus, pi, sigma), observed=Y)

With the inference part looking like:

batch_size = 128
X_m = pm.Minibatch(x, batch_size)
Y_m = pm.Minibatch(y.ravel(), batch_size)
with model:
    approx = pm.fit(10000,
                    more_replacements={X:X_m, Y:Y_m},
                    callbacks=[pm.callbacks.CheckParametersConvergence(tolerance=1e-4)],
                    method='fullrank_advi')

# Sample from the posterior
n = 1000
trace = approx.sample(n)

However, when I look at the posteriors unfortunately of the trace, it doesn’t seem to be able to converge on the weights (3 and -1 in our case).

The posterior of sigma seems to be values closer to 1 than 0.1 as used in the example.

So my questions are:

Have I specified the model wrong?
Anything else that is wrong here?

junpenglao · July 21, 2018, 7:42am

Actually VI does not avoid label switching (it does not have that concept to begin with), if you dont restrict the likelihood/posterior geometry, VI would try to approximate the same multimodal space (for a simple demonstration, see https://github.com/junpenglao/All-that-likelihood-with-PyMC3/blob/master/Notebooks/Normal_mixture_logp.ipynb). Our current VI (ie meanfield and fullrank ADVI) does not do what you intend to (ie., only approximate a single mode) - even if sometimes it display such behaviour it is out of chance. Some example of difficult of the current VI having difficulty you can see in this notebook (at the very end): https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/WIP/[WIP]%20Bayesian%20GMM.ipynb

My suggestion is to use sampling with a single chain and study the fit first, and then take it from there. A similar mixture regression example you can have a look is: Gaussian Mixture of regression (notebook here: https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/PyMC3QnA/mixture/Mixture_discourse.ipynb and https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/PyMC3QnA/mixture/Mixture_discourse_order.ipynb, the second one also use a VI and you can see it is not doing a good job).

Last note - VI inference on mixture model classically use a conjugate gradient method, which my intuition is that it forces the VI approximation to converge toward a single mode. Mansfield and fullrank that update all parameters at the same time during inference do not archive that.

sachinruk · July 21, 2018, 8:15am

Thanks for that!

Just a quick note while go through the stuff you sent. The reason I talked about being able to avoid label switching was the fact that VI was mode seeking. Is this not the case because of the way vi is implemented or am I just wrong about this?

junpenglao · July 21, 2018, 11:48am

Mode seeking behavior is a usual phenomenon but not a guarantee.

Topic		Replies	Views
Mixture model using pm.Mixture or writing in parts version agnostic modeling	6	617	August 14, 2023
Complicated PyMC Mixture model	0	189	September 21, 2023
2D Gaussian Mixture Model Questions	7	2074	August 25, 2017
When treating a small data, how to choose a model modeling	3	340	June 2, 2023
Modelling a Mixture Model modeling	1	89	June 19, 2024

Mixture of Linear models

Related topics