Updating priors for Gaussian Mixture Model


I’ve been working on PyMc3 for a while for building a robust model using the data I have. I came across Updating priors example for a linear regression kind of data. I’ve been trying to do the same for a model which uses gaussian mixture model to model the data which doesn’t seem to be much successful.
It would be really helpful if I had an example to refer.

Updating multivariate priors

What kind of difficulty you see?

I find that sometimes it might be easier to approximate the posterior with a Gaussian for the next batch of updating, eg see Performance speedup for updating posterior with new data and code here: https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/PyMC3QnA/Update%20prior%20with%20Interpolation.ipynb (see last cell)

If you have bounded variables, it would be a bit more difficult as you need to work out the inverse projection from unconstrained parameter space back to the bounded space.


Thanks for answering. I am using weights (Dirichlet distribution) in my model. So, when I tried to sample my parameters I get multidimensional values of parameters(mu,sigma). But,Interpolated gives us only 1D interpolation of the points. I get some dimensional errors.


You can still approximate them in the latent space similar to above, but in this case it might be better to use a MvNormal.
Maybe you can put up a notebook with some simulation data I can give you some more pointers.


Below is the code I am working on. I get an error at Interpolation.(I think you seem to have gone through my code earlier)

   observed_cell_delay = list_cell_delay[:5000]

    W = np.array([0.3, 0.7])
    fig, ax = plt.subplots(figsize=(8, 6))

    ax.hist(observed_cell_delay, bins=30, normed=True, lw=0)
    with pm.Model() as model:
        w = pm.Dirichlet('w',np.array([0.3,0.7]))
        mu = pm.Normal('mu', 0., 10., shape=W.size)
        tau = pm.Gamma('tau', 1., 1., shape=W.size)

        est_cell_del = pm.NormalMixture('est_cell_del', w, mu, tau=tau, observed=observed_cell_delay)
    with model:
        step = pm.Metropolis()
        trace = pm.sample(50000, n_init=10000, tune=1000,step=step,discard_tuned_samples=True, random_seed=SEED)

    with model:
        ppc_trace = pm.sample_ppc(trace, 1000,random_seed=SEED)
    fig, ax = plt.subplots(figsize=(8, 6))

    est = ppc_trace['est_cell_del'].mean(axis=0)
    real = np.asarray(observed_cell_delay)
    sns.jointplot(x=real, y=est,color="g",kind="kde",space=0)

    def from_posterior(param, samples):
        smin, smax = np.min(samples), np.max(samples)
        width = smax - smin
        x = np.linspace(smin, smax, 100)
        samples = samples[~np.isnan(samples)]
        y = stats.gaussian_kde(samples)(x)

        # what was never sampled should have a small probability but not 0,
        # so we'll extend the domain and use linear approximation of density on it
        x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
        y = np.concatenate([[0], y, [0]])
        return Interpolated(param, x, y)

    traces = [trace]
    for _ in range(10):
        # generate more data
        x = x+5000
        observed_cell_delay = list_cell_delay[x:x+5000]

        model = pm.Model()
        with model:
            # Priors are posteriors from previous iteration
            w = from_posterior('w', trace['w'])
            mu = from_posterior('mu', trace['mu'])
            tau = from_posterior('tau', trace['tau'])
            # Likelihood (sampling distribution) of observations
            est_cell_del = pm.NormalMixture('est_cell_del',w=w,mu=mu,tau=tau, observed=observed_cell_delay)

            # draw 10000 posterior samples
            trace = pm.sample(1000)