# Updating priors for Gaussian Mixture Model

I’ve been working on PyMc3 for a while for building a robust model using the data I have. I came across Updating priors example for a linear regression kind of data. I’ve been trying to do the same for a model which uses gaussian mixture model to model the data which doesn’t seem to be much successful.
It would be really helpful if I had an example to refer.

What kind of difficulty you see?

I find that sometimes it might be easier to approximate the posterior with a Gaussian for the next batch of updating, eg see Performance speedup for updating posterior with new data and code here: https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/PyMC3QnA/Update%20prior%20with%20Interpolation.ipynb (see last cell)

If you have bounded variables, it would be a bit more difficult as you need to work out the inverse projection from unconstrained parameter space back to the bounded space.

Thanks for answering. I am using weights (Dirichlet distribution) in my model. So, when I tried to sample my parameters I get multidimensional values of parameters(mu,sigma). But,Interpolated gives us only 1D interpolation of the points. I get some dimensional errors.

You can still approximate them in the latent space similar to above, but in this case it might be better to use a MvNormal.
Maybe you can put up a notebook with some simulation data I can give you some more pointers.

Below is the code I am working on. I get an error at Interpolation.(I think you seem to have gone through my code earlier)

observed_cell_delay = list_cell_delay[:5000]

W = np.array([0.3, 0.7])
fig, ax = plt.subplots(figsize=(8, 6))

ax.hist(observed_cell_delay, bins=30, normed=True, lw=0)
with pm.Model() as model:
w = pm.Dirichlet('w',np.array([0.3,0.7]))
mu = pm.Normal('mu', 0., 10., shape=W.size)
tau = pm.Gamma('tau', 1., 1., shape=W.size)

est_cell_del = pm.NormalMixture('est_cell_del', w, mu, tau=tau, observed=observed_cell_delay)
with model:
step = pm.Metropolis()
trace = pm.sample(50000, n_init=10000, tune=1000,step=step,discard_tuned_samples=True, random_seed=SEED)

with model:
ppc_trace = pm.sample_ppc(trace, 1000,random_seed=SEED)
fig, ax = plt.subplots(figsize=(8, 6))

est = ppc_trace['est_cell_del'].mean(axis=0)
real = np.asarray(observed_cell_delay)
sns.jointplot(x=real, y=est,color="g",kind="kde",space=0)

def from_posterior(param, samples):
smin, smax = np.min(samples), np.max(samples)
width = smax - smin
x = np.linspace(smin, smax, 100)
samples = samples[~np.isnan(samples)]
y = stats.gaussian_kde(samples)(x)

# what was never sampled should have a small probability but not 0,
# so we'll extend the domain and use linear approximation of density on it
x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
y = np.concatenate([[0], y, [0]])
return Interpolated(param, x, y)

traces = [trace]
x=0
for _ in range(10):
# generate more data
x = x+5000
observed_cell_delay = list_cell_delay[x:x+5000]

model = pm.Model()
with model:
# Priors are posteriors from previous iteration
w = from_posterior('w', trace['w'])
mu = from_posterior('mu', trace['mu'])
tau = from_posterior('tau', trace['tau'])

print(mu)
# Likelihood (sampling distribution) of observations
est_cell_del = pm.NormalMixture('est_cell_del',w=w,mu=mu,tau=tau, observed=observed_cell_delay)

# draw 10000 posterior samples
trace = pm.sample(1000)
traces.append(trace)

Hi, Ravinderatla,
Have you solved this problem?