Got the following information when running the codes. I am confused why it dose not converges yet when I set so high number of iterations and tune.
Sampling 2 chains for 5_000 tune and 100_000 draw iterations (10_000 + 200_000 draws total) took 5229 seconds.
There were 29526 divergences after tuning. Increasetarget_accept
or reparameterize.
The acceptance probability does not match the target. It is 0.24431517844207348, but should be close to 0.8. Try to increase the number of tuning steps.
There were 42740 divergences after tuning. Increasetarget_accept
or reparameterize.
The acceptance probability does not match the target. It is 0.4470873562701841, but should be close to 0.8. Try to increase the number of tuning steps.
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.
import numpy as np
import scipy as sp
import pandas as pd
import pymc3 as pm
# generate data
n_components = 3
data = np.hstack([sp.stats.norm(loc=0).rvs(size=100), sp.stats.norm(loc=1).rvs(size=100), sp.stats.norm(loc=2).rvs(size=100)])
# GMM model by PyMC
with pm.Model() as gauss_mix:
mu = pm.Normal("mu", 0, 10, shape=n_components)
sigma = pm.HalfNormal("sigma", 10, shape=n_components)
weights = pm.Dirichlet("weights", np.ones(n_components))
likelihood = pm.NormalMixture("likelihood", w=weights, mu=mu, sigma=sigma, observed=data)
trace = pm.sample(100000, tune=5000, cores=1, progressbar=False)
summary = pm.summary(trace)
estimate = summary['mean']
result = pd.DataFrame({'mu': [estimate['mu[%d]' % k] for k in range(n_components)],
'sigma': [estimate['sigma[%d]' % k] for k in range(n_components)],
'weights': [estimate['weights[%d]' % k] for k in range(n_components)],
}, index=['class %d' % k for k in range(n_components)])
print(result)