Ok so I nuked the environments and reinstalled them. package info for enviroment pymc=5.8.2 and its comparison to 5.9.0 and 5.9.1 are attached below. The comparison is now very clean, only pytensor and pymc versions are different between different environments (and pytensor and its base are the same versions!).
Following, I did a set of tests where I call the script (added to the end) from these three environments from the terminal (did not use spyder this time and also printed pm.version to make sure I am using the version that I think I am using). I ran 5.9.1 with three different seeds to make sure it was not a seed issue and others with one seed. As before 5.8.2 and 5.9.0 reach completion (actually in similar times this round, so maybe the difference between these I saw previously was just a seed issue). 5.9.1 still gets infinitely stuck after a while (so I stopped after sometime). Here are the logs (note in 5.8.2 and 5.9.0 advi terminates before %100 when it converges):
pymc version: 5.9.1, random_seed: 5
using advi: True
Auto-assigning NUTS sampler...
Initializing NUTS using advi+adapt_diag...
^CInterrupted at 327 [0%]: Average Loss = 3,869.3-----------------------------------------------------| 0.16% [327/200000 01:11<12:09:34 Average Loss = 3,870.8]
pymc version: 5.9.1, random_seed: 111
using advi: True
Auto-assigning NUTS sampler...
Initializing NUTS using advi+adapt_diag...
^CInterrupted at 420 [0%]: Average Loss = 3,738.8-----------------------------------------------------| 0.21% [420/200000 01:23<11:04:16 Average Loss = 3,748.2]
pymc version: 5.9.1, random_seed: 1111
using advi: True
Auto-assigning NUTS sampler...
Initializing NUTS using advi+adapt_diag...
^CInterrupted at 98 [0%]: Average Loss = 3,830.1------------------------------------------------------| 0.05% [98/200000 00:21<11:59:44 Average Loss = 3,868.2]
pymc version: 5.8.2, random_seed: 1111
using advi: True
Auto-assigning NUTS sampler...
Initializing NUTS using advi+adapt_diag...
Convergence achieved at 59400██-----------------------------------------------------------------------| 29.67% [59346/200000 03:01<07:10 Average Loss = 1,038.5]
Interrupted at 59,399 [29%]: Average Loss = 1,824.4
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [w, x_coord, y_coord, sigma]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 665 seconds.[8000/8000 11:04<00:00 Sampling 4 chains, 0 divergences]
pymc version: 5.9.0, random_seed: 1111
using advi: True
Auto-assigning NUTS sampler...
Initializing NUTS using advi+adapt_diag...
Convergence achieved at 59400██-----------------------------------------------------------------------| 29.68% [59354/200000 02:55<06:56 Average Loss = 1,038.5]
Interrupted at 59,399 [29%]: Average Loss = 1,824.4
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [w, x_coord, y_coord, sigma]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 657 seconds.[8000/8000 10:57<00:00 Sampling 4 chains, 0 divergences]
Following to make sure this was not an advi issue I ran the same code with just pm.sample().
What happens with 5.8 and 5.9.0, it initially starts with an estimate of ~1hr but then it speeds up considerably after 10 minutes and finishes around 20 minutes (this could be something to check for people complaining that Gaussian mixtures are too slow, i.e try advi and also try running until the end).
However, 5.9.1 again gets infinitely stuck:
pymc version: 5.8.2, random_seed: 1111
using advi: False
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [w, x_coord, y_coord, sigma]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1563 seconds.000/8000 26:02<00:00 Sampling 4 chains, 49 divergences]
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
There were 49 divergences after tuning. Increase `target_accept` or reparameterize.
pymc version: 5.9.1, random_seed: 1111
using advi: False
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [w, x_coord, y_coord, sigma]
^CTraceback (most recent call last):-----------------------------------------------------------| 0.95% [76/8000 06:01<10:27:32 Sampling 4 chains, 0 divergences]
pymc version: 5.9.0, random_seed: 1111
using advi: False
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [w, x_coord, y_coord, sigma]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1570 seconds.000/8000 26:09<00:00 Sampling 4 chains, 49 divergences]
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
There were 49 divergences after tuning. Increase `target_accept` or reparameterize.
Now going back to what @ricardoV94 has asked, I have done some debug prints to compare between 5.9.0 and 5.9.1. Even though the traces are still identical, this time logp-dlogp are quite different so I attached the diff as logp_diff.txt. In both cases the model.compile_logp() values are identical (using the same seed):
pymc version: 5.9.0, random_seed: 1111
-3115.2765648137506
[ 1.5396188 108.35295371 144.87835984 112.74718936 -1.31499805
-125.21506787 -199.13245673 -205.10570647 -125.49660322 10.42051968
25.78704799 20.52551608 -17.55547322 -39.17761053 7.88794009
3.48514039 -42.51867365 -1.21457135 -29.03688925 -9.1239996
-49.67507295 0.39696196 13.55134652 6.4564686 ]
pymc version: 5.9.1, random_seed: 1111
-3115.2765648137506
[ 1.5396188 108.35295371 144.87835984 112.74718936 -1.31499805
-125.21506787 -199.13245673 -205.10570647 -125.49660322 10.42051968
25.78704799 20.52551608 -17.55547322 -39.17761053 7.88794009
3.48514039 -42.51867365 -1.21457135 -29.03688925 -9.1239996
-49.67507295 0.39696196 13.55134652 6.4564686 ]
Happy to run more diagnostics should you have any in mind (I am going to try setting the threading environment variables and see if that makes a difference too).
Finally the code:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import pymc as pm
import numpy as np
import matplotlib.pyplot as plt
import pytensor.tensor as ptt
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler
n_clusters = 5
data, labels = make_blobs(n_samples=1000, centers=n_clusters, random_state=10)
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
plt.scatter(*scaled_data.T, c=labels)
coords={"cluster": np.arange(n_clusters),
"obs_id": np.arange(data.shape[0]),
"coord":['x', 'y']}
# When you use the ordered transform, the initial values need to be
# monotonically increasing
sorted_initvals = np.linspace(-2, 2, 5)
pymc_version = pm.__version__
if pymc_version in ["5.8.2", "5.9.0", "5.9.1"]:
trans = pm.distributions.transforms.ordered
else:
raise ValueError(f"Unknown pymc version {pymc_version}")
random_seed = 1111
use_advi = False
print(f"pymc version: {pymc_version}, random_seed: {random_seed}")
print(f"using advi: {use_advi}")
with pm.Model(coords=coords) as m:
# Use alpha > 1 to prevent the model from finding sparse solutions -- we know
# all 5 clusters should be represented in the posterior
w = pm.Dirichlet("w", np.full(n_clusters, 10), dims=['cluster'])
# Mean component
x_coord = pm.Normal("x_coord", sigma=1, dims=["cluster"],
transform=trans, initval=sorted_initvals)
y_coord = pm.Normal('y_coord', sigma=1, dims=['cluster'])
centroids = pm.Deterministic('centroids', ptt.concatenate([x_coord[None], y_coord[None]]).T,
dims=['cluster', 'coord'])
# Diagonal covariances. Could also model the full covariance matrix, but I didn't try.
sigma = pm.HalfNormal('sigma', sigma=1, dims=['cluster', 'coord'])
covs = [ptt.diag(sigma[i]) for i in range(n_clusters)]
# Define the mixture
components = [pm.MvNormal.dist(mu=centroids[i], cov=covs[i]) for i in range(n_clusters)]
y_hat = pm.Mixture("y_hat",
w,
components,
observed=scaled_data,
dims=["obs_id", 'coord'])
if use_advi:
idata = pm.sample(init='advi+adapt_diag', random_seed=random_seed)
else:
idata = pm.sample(random_seed=random_seed)
packages.txt (13.0 KB)
logp_diff.txt (342.7 KB)
trace_5_9.txt (1.4 KB)
logp-dlogp_5.9.0.txt (200.8 KB)
trace_5_9.txt (1.4 KB)
logp-dlogp_5.9.1.txt (203.0 KB)