Hello!
I have been having some problems with reusing PyMC3 models iteratively. I have seen some notebooks from PyMC3 that were trying to do so (e.g. this) although the Interpolated()
in from_posterior()
does not work for me, as I have multidimensional data. Also, it’s a Gaussian Mixture Model, so gaussian_kde()
is also not a good choice.
Ok, so I create the first iteration, I fit a GMM, where the only moving parameter are the weights (mu
s and sigma
s are static). Now I sample circa 300 points from this distribution and try to fit a gaussian on them, with moving parameters: weights, mus, and also sigmas. This seems to not work properly, could you please direct me to a better solution? Am I doing some obvious mistake?
Thank you for your time!
I attach code and also visualizations of the “proposed fit”.
On the picture below is shown the proposed Gaussian Mixture fit on top of the original histogram (note that the GMM is scaled to be visible with the histogram)
In the following picture can be seen the posterior predictive. In the majority (and also in the mean - the dashed line) it should be okay, but the gaussian in the middle concerns me…
The code:
TRUE_MUS = np.array([100, 300])
TRUE_SIGS = np.array([50, 60])
# creation of samples from a previously created PyMC3 model
with pm.Model() as model:
w = np.array([1.0/len(TRUE_MUS) for _ in TRUE_MUS])
y = pm.NormalMixture('y', w, mu=TRUE_MUS, sigma=TRUE_SIGS)
trace = pm.sample(800)
burn_in = 500 # skips the n iters, where the MCMC is just wandering
chain = trace[burn_in:]
ppc = pm.sample_posterior_predictive(chain, var_names=['y'], model=model)
data = ppc['y'].transpose()
n = len(data)
bins = int(np.sqrt(n))
plt.hist(data, bins=bins, label="Data")
plt.legend()
plt.show()
# fitting of a Gaussian Mixture Model based on samples from previously built model
with pm.Model() as GMMfit:
starting_mu = np.mean(data)
w = pm.Dirichlet('w', np.ones(2))
mus = pm.Normal('mus', starting_mu, 100, shape=2)
sigs = pm.HalfCauchy('sigs', beta=5.0, shape=2)
GMM = pm.NormalMixture('gmm', w=w, mu=mus, sigma=sigs, observed=data)
trace = pm.sample(1000, step=pm.Metropolis())
burn_in = 500
chain = trace[burn_in:]
ppc = pm.sample_posterior_predictive(chain, var_names=['w', 'mus', 'sigs', 'gmm'], model=GMMfit)
idata_pymc3 = az.from_pymc3(chain, posterior_predictive=ppc)
mus_infered = []
sigs_infered = []
weights_infered = []
for i in range(2):
mus_infered.append(np.mean(ppc['mus'].transpose()[i]))
sigs_infered.append(np.mean(ppc['sigs'].transpose()[i]))
weights_infered.append(np.mean(ppc['w'].transpose()[i]))
# creation of the predicted gaussian mixture
x = np.linspace(0,600,600)
for i in range(2):
if i == 0:
y = norm.pdf(x, mus_infered[i], sigs_infered[i]) * weights_infered[i]
else:
y += norm.pdf(x, mus_infered[i], sigs_infered[i]) * weights_infered[i]
y *= 14900
plt.plot(x, y, label="Rec. GMM", color="tab:red", lw=2, ls="--")
plt.hist(data, bins=bins, label="Data")
plt.legend()
az.plot_ppc(idata_pymc3)
plt.show()