I’m benginner and I just follow the examples “Dirichlet process mixtures for density estimation” in the websitePYMC3.
https://docs.pymc.io/notebooks/dp_mix.html
For convergence, I also post codes here.
from matplotlib import pyplot as plt
import numpy as np
import pymc3 as pm
import scipy as sp
import seaborn as sns
from theano import tensor as tt
import pandas as pd
blue, *_ = sns.color_palette()
SEED = 5132290 # from random.org
np.random.seed(SEED)
N = 20
K = 30
alpha = 2.
P0 = sp.stats.norm
beta = sp.stats.beta.rvs(1, alpha, size=(N, K))
w = np.empty_like(beta)
w[:, 0] = beta[:, 0]
w[:, 1:] = beta[:, 1:]*(1 - beta[:, :-1]).cumprod(axis=1)
omega = P0.rvs(size=(N, K))
x_plot = np.linspace(-3, 3, 200)
sample_cdfs = (w[..., np.newaxis]*np.less.outer(omega, x_plot)).sum(axis=1)
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(x_plot, sample_cdfs.T, c='gray', alpha=0.75,
label='DP sample CDFs');
ax.plot(x_plot, P0.cdf(x_plot), c='k', label='Base CDF');
ax.set_title(r'$\alpha = {}$'.format(alpha));
ax.legend(loc=2);
fig, (l_ax, r_ax) = plt.subplots(ncols=2, sharex=True, sharey=True, figsize=(16, 6))
K = 50
alpha = 10.
beta = sp.stats.beta.rvs(1, alpha, size=(N, K))
w = np.empty_like(beta)
w[:, 0] = beta[:, 0]
w[:, 1:] = beta[:, 1:]*(1 - beta[:, :-1]).cumprod(axis=1)
omega = P0.rvs(size=(N, K))
sample_cdfs = (w[..., np.newaxis] * np.less.outer(omega, x_plot)).sum(axis=1)
l_ax.plot(x_plot, sample_cdfs[0], c='gray', alpha=0.75, label='DP sample CDFs')
l_ax.plot(x_plot, sample_cdfs[1:].T, c='gray', alpha=0.75)
l_ax.plot(x_plot, P0.cdf(x_plot), c='k', label='Base CDF'); l_ax.set_title(r'$\alpha = {}$'.format(alpha));
l_ax.legend(loc=2)
K = 200
alpha = 50.
beta = sp.stats.beta.rvs(1, alpha, size=(N, K))
w = np.empty_like(beta)
w[:, 0] = beta[:, 0]
w[:, 1:] = beta[:, 1:] * (1 - beta[:, :-1]).cumprod(axis=1)
omega = P0.rvs(size=(N, K))
sample_cdfs = (w[..., np.newaxis] * np.less.outer(omega, x_plot)).sum(axis=1)
r_ax.plot(x_plot, sample_cdfs[0], c='gray', alpha=0.75,label='DP sample CDFs')
r_ax.plot(x_plot, sample_cdfs[1:].T, c='gray', alpha=0.75)
r_ax.plot(x_plot, P0.cdf(x_plot), c='k', label='Base CDF')
r_ax.set_title(r'$\alpha = {}$'.format(alpha))
r_ax.legend(loc=2);
N = 5
K = 30
alpha = 2
P0 = sp.stats.norm
f = lambda x, theta: sp.stats.norm.pdf(x, theta, 0.3)
beta = sp.stats.beta.rvs(1, alpha, size=(N, K))
w = np.empty_like(beta)
w[:, 0] = beta[:, 0]
w[:, 1:] = beta[:, 1:] * (1 - beta[:, :-1]).cumprod(axis=1)
theta = P0.rvs(size=(N, K))
dpm_pdf_components = f(x_plot[np.newaxis, np.newaxis, :], theta[..., np.newaxis])
dpm_pdfs = (w[..., np.newaxis] * dpm_pdf_components).sum(axis=1)
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(x_plot, dpm_pdfs.T, c='gray');
ax.set_yticklabels([]);
old_faithful_df = pd.read_csv(pm.get_data('old_faithful.csv'))
old_faithful_df['std_waiting'] = (old_faithful_df.waiting - old_faithful_df.waiting.mean()) / old_faithful_df.waiting.std()
old_faithful_df.head()
fig, ax = plt.subplots(figsize=(8, 6))
n_bins = 20
ax.hist(old_faithful_df.std_waiting, bins=n_bins, color=blue, lw=0, alpha=0.5);
ax.set_xlabel('Standardized waiting time between eruptions');
ax.set_ylabel('Number of eruptions');
def stick_breaking(beta):
portion_remaining = tt.concatenate([[1], tt.extra_ops.cumprod(1 - beta)[:-1]])
with pm.Model() as model:
alpha = pm.Gamma('alpha', 1., 1.)
beta = pm.Beta('beta', 1., alpha, shape=K)
w = stick_breaking(beta)
w = pm.Deterministic('w', w/w.sum(keepdims=True))
tau = pm.Gamma('tau', 1., 1., shape=K)
lambda_ = pm.Uniform('lambda_', 0, 5, shape=K)
mu = pm.Normal('mu', 0, tau=lambda_ * tau, shape=K)
obs = pm.NormalMixture('obs', w, mu, tau=lambda_ * tau,
observed=old_faithful_df.std_waiting.values)
with model:
trace = pm.sample(1000, cores=1)
The sample doesn’t converge, the following shows the details
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Sequential sampling (2 chains in 1 job) NUTS: [mu, lambda_, tau, beta, alpha] 100%|██████████████████████████████████████| 1500/1500 [02:02<00:00, 12.26it/s] 100%|██████████████████████████████████████| 1500/1500 [01:59<00:00, 14.55it/s] C:\Users\PrettyGirlFeiWu\Anaconda3\lib\site-packages\mkl_fft\_numpy_fft.py:1044: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use
arr[tuple(seq)]instead of
arr[seq]. In the future this will be interpreted as an array index,
arr[np.array(seq)], which will result either in an error or a different result. output = mkl_fft.rfftn_numpy(a, s, axes) There were 560 divergences after tuning. Increase
target_acceptor reparameterize. The acceptance probability does not match the target. It is 0.3341195732727439, but should be close to 0.8. Try to increase the number of tuning steps. There were 649 divergences after tuning. Increase
target_acceptor reparameterize. The acceptance probability does not match the target. It is 0.5850455150818334, but should be close to 0.8. Try to increase the number of tuning steps. The gelman-rubin statistic is larger than 1.4 for some parameters. The sampler did not converge. The estimated number of effective samples is smaller than 200 for some parameters.
For fixing this problem, I have resorted to Reparameterization method and transform the model to non-centered version.
The modified model is shown as following
2nd Model
with pm.Model() as model:
alpha = pm.Gamma('alpha', 1., 1.)
beta = pm.Beta('beta', 1., alpha, shape=K)
w = pm.Deterministic('w', stick_breaking(beta))
tau = pm.Gamma('tau', 1., 1., shape=K)
lambda_ = pm.Uniform('lambda_', 0, 5, shape=K)
mu_offset = pm.Normal('mu_offset', 0, tau=1, shape=K)
mu = pm.Deterministic('mu', 0 + mu_offset / (lambda_ * tau))
obs = pm.NormalMixture('obs', w, mu, tau=lambda_ * tau,
observed=old_faithful_df.std_waiting.values)
Divergences still can not be eleminited. The details are shown:
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [mu_offset, lambda_, tau, beta, alpha]
100%|██████████████████████████████████████| 1500/1500 [03:44<00:00, 7.21it/s]
100%|██████████████████████████████████████| 1500/1500 [03:19<00:00, 7.10it/s]
C:\Users\PrettyGirlFeiWu\Anaconda3\lib\site-packages\mkl_fft\_numpy_fft.py:1044: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
output = mkl_fft.rfftn_numpy(a, s, axes)
There were 112 divergences after tuning. Increase `target_accept` or reparameterize.
There were 288 divergences after tuning. Increase `target_accept` or reparameterize.
The estimated number of effective samples is smaller than 200 for some parameters.
Continuing on, the remaining code is posted
sunspot_df = pd.read_csv(pm.get_data('sunspot.csv'), sep=';', names=['time', 'sunspot.year'], usecols=[0, 1])
sunspot_df.head()
K = 30
N = sunspot_df.shape[0]
#3nd Model
with pm.Model() as model:
alpha = pm.Gamma('alpha', 1., 1.)
beta = pm.Beta('beta', 1, alpha, shape=K)
w = pm.Deterministic('w', stick_breaking(beta))
mu = pm.Uniform('mu', 0., 5., shape=K)
obs = pm.Mixture('obs', w, pm.Poisson.dist(mu), observed=sunspot_df['sunspot.year'])
with model:
trace = pm.sample(1000, random_seed=SEED)
The sample doesn’t converge, the following shows the details
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [mu, beta, alpha] Sampling 2 chains: 100%|████████████████| 3000/3000 [01:35<00:00, 29.83draws/s] C:\Users\PrettyGirlFeiWu\Anaconda3\lib\site-packages\mkl_fft\_numpy_fft.py:1044: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use
arr[tuple(seq)]instead of
arr[seq]. In the future this will be interpreted as an array index,
arr[np.array(seq)], which will result either in an error or a different result. output = mkl_fft.rfftn_numpy(a, s, axes) There were 207 divergences after tuning. Increase
target_acceptor reparameterize. There were 340 divergences after tuning. Increase
target_acceptor reparameterize. The estimated number of effective samples is smaller than 200 for some parameters.
Hower, we cannot following the Reparameterization method above for transforming the model to non-centered version.
Finnaly, My questions are:
-
How to make the sampling with the 3nd model converge
-
How to eliminate the Divergences with the 2nd model.
-
I really donot feel safe with the sample method due to its Divergences. With the 2nd model, in the prensence of Divergences, the sampling result can be guarateed to converge to the true value, or not? If not, how can get the right one.
-
I just copy the code in the example, the convergence cannot be achieved. How the convergence is achieved in the example. I’m confusing.