You can read the LKJ example here. For a minimal example, take the following code:
import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
SEED = 3264602 # from random.org
np.random.seed(SEED)
N = 10000
μ_actual = np.array([1, -2])
Σ_actual = np.array([[0.5, -0.3],
[-0.3, 1.]])
x = np.random.multivariate_normal(μ_actual, Σ_actual, size=N)
with pm.Model() as model:
packed_L = pm.LKJCholeskyCov('packed_L', n=2,
eta=2., sd_dist=pm.HalfCauchy.dist(2.5))
L = pm.expand_packed_triangular(2, packed_L)
Σ = pm.Deterministic('Σ', L.dot(L.T))
μ = pm.Normal('μ', 0., 10., shape=2,
testval=x.mean(axis=0))
obs = pm.MvNormal('obs', μ, chol=L, observed=x)
with model:
trace = pm.sample(random_seed=SEED, cores=4)
print(pm.summary(trace))
pm.traceplot(trace, varnames=['μ']);
plt.show()
I cannot reproduce the results because this appears:
Sampling 4 chains: 0%| | 0/4000 [00:00<?, ?draws/s]
It doesn’t sample. Why? I don’t know. With my current environment, this problem appears (python 3.6, pymc3
3.6 and theano
1.0.4). If I use a virtual environment (using conda), the problem persists (python 3.7 or python 3.6, pymc3
3.6 or pymc3
3.7, and theano
1.0.4). The most strange part is that if I use a virtual environment using virtualenv (python 3.6, pymc3
3.6 or pymc3
3.7, theano
1.0.4), it does sample! Of course, the usual warnings are shown:
WARNING (theano.configdefaults): install mkl with `conda install mkl-service`: No module named 'mkl'
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
My question is simple: what is happening? Do you have the same problem if you run the code?
Let me give another example. The code is:
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm
import theano.tensor as tt
#Parameters to set
N = 1000
mu_x = 2
variance_x = 5
mu_y = 5
variance_y = 9
mu_actual = np.array([mu_x, mu_y])
sigma_actual = np.array([[variance_x, 4], [4, variance_y]])
z = np.random.multivariate_normal(mu_actual, sigma_actual, size=N)
print(z)
print(z.shape)
with pm.Model() as model:
# sd_dist = pm.HalfCauchy.dist(beta=2.5)
packed_L = pm.LKJCholeskyCov('packed_L', n=2, eta=2.,
sd_dist=pm.HalfCauchy.dist(beta=2.5))
L = pm.expand_packed_triangular(2, packed_L)
sigma_may = pm.Deterministic('sigma_may', L.dot(L.T))
alpha = pm.Normal('alpha', mu=0, sd=10, shape=2, testval=z.mean(axis=0))
vals = pm.MvNormal('vals', mu=alpha, chol=L, observed=z)
print(model.check_test_point())
with model:
trace = pm.sample()
print(pm.summary(trace))
pm.traceplot(trace, varnames=['alpha'])
plt.tight_layout()
plt.show()
With the above code, it runs with my current environment, no problem here. As you can see, it is almost the same code. Why does this work? No idea. Although I have to say that the code doesn’t run in my virtual environment (conda, python 3.7.3, pymc3
3.7 and theano
1.0.4).
There is another problem, and for that let’s take the first example. If I use scipy
v. 1.2.0, it takes ten seconds:
Sampling 4 chains: 100%|████████████████████████████████████| 4000/4000 [00:12<00:00, 320.54draws/s]
If I use scipy
v. 1.2.1, it takes approximately one minute:
Sampling 4 chains: 100%|█████████████████████████████████████| 4000/4000 [01:12<00:00, 55.55draws/s]
And with scipy
v. 1.3.0, it takes one minute:
Sampling 4 chains: 100%|█████████████████████████████████████| 4000/4000 [01:04<00:00, 61.66draws/s]
All of this was done with virtualenv, python 3.6.6, numpy
1.16.4, pymc3
3.7 and theano
1.0.4. I don’t know if I should open an issue in Github or the behavior is normal.