Strange problems with LKJ example

You can read the LKJ example here. For a minimal example, take the following code:

import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm

SEED = 3264602 # from random.org
np.random.seed(SEED)

N = 10000

μ_actual = np.array([1, -2])
Σ_actual = np.array([[0.5, -0.3],
                     [-0.3, 1.]])

x = np.random.multivariate_normal(μ_actual, Σ_actual, size=N)

with pm.Model() as model:
    packed_L = pm.LKJCholeskyCov('packed_L', n=2,
                                 eta=2., sd_dist=pm.HalfCauchy.dist(2.5))

    L = pm.expand_packed_triangular(2, packed_L)
    Σ = pm.Deterministic('Σ', L.dot(L.T))

    μ = pm.Normal('μ', 0., 10., shape=2,
                  testval=x.mean(axis=0))
    obs = pm.MvNormal('obs', μ, chol=L, observed=x)


with model:
    trace = pm.sample(random_seed=SEED, cores=4)


print(pm.summary(trace))
pm.traceplot(trace, varnames=['μ']);
plt.show()

I cannot reproduce the results because this appears:

Sampling 4 chains:   0%|                                                | 0/4000 [00:00<?, ?draws/s]

It doesn’t sample. Why? I don’t know. With my current environment, this problem appears (python 3.6, pymc3 3.6 and theano 1.0.4). If I use a virtual environment (using conda), the problem persists (python 3.7 or python 3.6, pymc3 3.6 or pymc3 3.7, and theano 1.0.4). The most strange part is that if I use a virtual environment using virtualenv (python 3.6, pymc3 3.6 or pymc3 3.7, theano 1.0.4), it does sample! Of course, the usual warnings are shown:

WARNING (theano.configdefaults): install mkl with `conda install mkl-service`: No module named 'mkl'
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

My question is simple: what is happening? Do you have the same problem if you run the code?

Let me give another example. The code is:

from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm
import theano.tensor as tt

#Parameters to set
N = 1000

mu_x = 2
variance_x = 5
mu_y = 5
variance_y = 9

mu_actual = np.array([mu_x, mu_y])
sigma_actual = np.array([[variance_x, 4], [4, variance_y]])

z = np.random.multivariate_normal(mu_actual, sigma_actual, size=N)
print(z)
print(z.shape)

with pm.Model() as model:
#    sd_dist = pm.HalfCauchy.dist(beta=2.5)

    packed_L = pm.LKJCholeskyCov('packed_L', n=2, eta=2., 
                                    sd_dist=pm.HalfCauchy.dist(beta=2.5))
    L = pm.expand_packed_triangular(2, packed_L)
    sigma_may = pm.Deterministic('sigma_may', L.dot(L.T))
    
    alpha = pm.Normal('alpha', mu=0, sd=10, shape=2, testval=z.mean(axis=0))
    
    vals = pm.MvNormal('vals', mu=alpha, chol=L, observed=z)
    
print(model.check_test_point())    
        
with model:
    trace = pm.sample()

print(pm.summary(trace))
pm.traceplot(trace, varnames=['alpha'])
plt.tight_layout()
plt.show()

With the above code, it runs with my current environment, no problem here. As you can see, it is almost the same code. Why does this work? No idea. Although I have to say that the code doesn’t run in my virtual environment (conda, python 3.7.3, pymc3 3.7 and theano 1.0.4).

There is another problem, and for that let’s take the first example. If I use scipy v. 1.2.0, it takes ten seconds:

Sampling 4 chains: 100%|████████████████████████████████████| 4000/4000 [00:12<00:00, 320.54draws/s]

If I use scipy v. 1.2.1, it takes approximately one minute:

Sampling 4 chains: 100%|█████████████████████████████████████| 4000/4000 [01:12<00:00, 55.55draws/s]

And with scipy v. 1.3.0, it takes one minute:

Sampling 4 chains: 100%|█████████████████████████████████████| 4000/4000 [01:04<00:00, 61.66draws/s]

All of this was done with virtualenv, python 3.6.6, numpy 1.16.4, pymc3 3.7 and theano 1.0.4. I don’t know if I should open an issue in Github or the behavior is normal.

No, that is not normal. Thanks for reporting this :slight_smile:
I’m really not sure what’s happening, both example work on my machine.
The second problem sounds as if it could be related to blas. (The second warning also hints at that). Maybe theano is falling back on a python implementation for some reason?
Some things you could try to help us debug this:

  • What operating system are you using?
  • Output of np.__config__.show() (For both the env with scipy 1.2.0 and scipy 1.3.0 or 1.2.1)
  • Run the first example (the one where the sampler doesn’t seem to do anything) with pm.sample(cores=1) That disables parallel sampling and might tell us if it is related to that.
  • Show the output of
func = model.logp_dlogp_function()
theano.printing.debugprint(func._theano_function)

for one scipy version where sampling is slow and for one where it is fast.

You can post that with your original post on github, that is a better place for bugs.

With pm.sample(cores=1), the sampler works, and that is weird.

I have the same problem, which is solved by setting cores=1, as suggested by @rosgori. I am running on windows, maybe it is related?