I’ve been working on PyMc3 for a while for building a robust model using the data I have. I came across Updating priors example for a linear regression kind of data. I’ve been trying to do the same for a model which uses gaussian mixture model to model the data which doesn’t seem to be much successful.
It would be really helpful if I had an example to refer.
What kind of difficulty you see?
I find that sometimes it might be easier to approximate the posterior with a Gaussian for the next batch of updating, eg see Performance speedup for updating posterior with new data and code here: https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/PyMC3QnA/Update%20prior%20with%20Interpolation.ipynb (see last cell)
If you have bounded variables, it would be a bit more difficult as you need to work out the inverse projection from unconstrained parameter space back to the bounded space.
Thanks for answering. I am using weights (Dirichlet distribution) in my model. So, when I tried to sample my parameters I get multidimensional values of parameters(mu,sigma). But,Interpolated gives us only 1D interpolation of the points. I get some dimensional errors.
You can still approximate them in the latent space similar to above, but in this case it might be better to use a MvNormal.
Maybe you can put up a notebook with some simulation data I can give you some more pointers.
Below is the code I am working on. I get an error at Interpolation.(I think you seem to have gone through my code earlier)
observed_cell_delay = list_cell_delay[:5000]
W = np.array([0.3, 0.7])
fig, ax = plt.subplots(figsize=(8, 6))
ax.hist(observed_cell_delay, bins=30, normed=True, lw=0)
with pm.Model() as model:
w = pm.Dirichlet('w',np.array([0.3,0.7]))
mu = pm.Normal('mu', 0., 10., shape=W.size)
tau = pm.Gamma('tau', 1., 1., shape=W.size)
est_cell_del = pm.NormalMixture('est_cell_del', w, mu, tau=tau, observed=observed_cell_delay)
with model:
step = pm.Metropolis()
trace = pm.sample(50000, n_init=10000, tune=1000,step=step,discard_tuned_samples=True, random_seed=SEED)
with model:
ppc_trace = pm.sample_ppc(trace, 1000,random_seed=SEED)
fig, ax = plt.subplots(figsize=(8, 6))
est = ppc_trace['est_cell_del'].mean(axis=0)
real = np.asarray(observed_cell_delay)
sns.jointplot(x=real, y=est,color="g",kind="kde",space=0)
def from_posterior(param, samples):
smin, smax = np.min(samples), np.max(samples)
width = smax - smin
x = np.linspace(smin, smax, 100)
samples = samples[~np.isnan(samples)]
y = stats.gaussian_kde(samples)(x)
# what was never sampled should have a small probability but not 0,
# so we'll extend the domain and use linear approximation of density on it
x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
y = np.concatenate([[0], y, [0]])
return Interpolated(param, x, y)
traces = [trace]
x=0
for _ in range(10):
# generate more data
x = x+5000
observed_cell_delay = list_cell_delay[x:x+5000]
model = pm.Model()
with model:
# Priors are posteriors from previous iteration
w = from_posterior('w', trace['w'])
mu = from_posterior('mu', trace['mu'])
tau = from_posterior('tau', trace['tau'])
print(mu)
# Likelihood (sampling distribution) of observations
est_cell_del = pm.NormalMixture('est_cell_del',w=w,mu=mu,tau=tau, observed=observed_cell_delay)
# draw 10000 posterior samples
trace = pm.sample(1000)
traces.append(trace)
Hi, Ravinderatla,
Have you solved this problem?