Slow MvNormal sampling with multiple cores

I’m trying to fit a simple MvNormal model. Initially, I’m using a little synthetic dataset to make sure I have things correct.

Here’s the test model, where I have a two-dimensional dataset with 1,000 rows:

import numpy as np
import scipy as sp
import pymc as pm
import arviz as az

mvn = sp.stats.multivariate_normal(
    np.array([-0.1, 1]),
        [1.0, 0.5],
        [0.5, 2.0],
mvdata = mvn.rvs(1000)
n, m = mvdata.shape

with pm.Model() as model:
    chol, corr, stds = pm.LKJCholeskyCov(
        sd_dist=pm.Exponential.dist(1.0, shape=m),
    cov = pm.Deterministic('cov',
    mu = pm.StudentT('mu', mu=0, nu=2, shape=m)
    obs = pm.MvNormal("obs", mu=mu, chol=chol, observed=mvdata, shape=(n,m))

When I sample using 4 chains and 1 core, it goes quickly and smoothly (other than the self.vm() warning that I don’t understand).

with model:
    post = pm.sample(chains=4, cores=1)

But when I switch to 4 cores, it takes (comparatively) forever:

with model:
    post = pm.sample(chains=4, cores=4)

That’s about 54 times longer using 4 times as many cores…?

I’m not sure how to even start diagnosing this issue, as I’m totally unfamiliar with the backend libraries. My real application is a much larger dataset though, so I want to understand how to get the best performance. Any insights?

The package versions:

numpy version:  1.24.3
scipy version:  1.10.1
pymc version:  5.6.1
arviz version:  0.16.1

Is this FAQ entry helpful? Frequently Asked Questions - #19 by ricardoV94

1 Like

Yes, thanks for pointing me there. I didn’t turn it up as I was initially searching around. I’m guessing it’s conflicting processes/threads.