I’m trying to fit a simple MvNormal model. Initially, I’m using a little synthetic dataset to make sure I have things correct.
Here’s the test model, where I have a two-dimensional dataset with 1,000 rows:
import numpy as np import scipy as sp import pymc as pm import arviz as az mvn = sp.stats.multivariate_normal( np.array([-0.1, 1]), np.array([ [1.0, 0.5], [0.5, 2.0], ]) ) mvdata = mvn.rvs(1000) n, m = mvdata.shape with pm.Model() as model: chol, corr, stds = pm.LKJCholeskyCov( 'LKJ', eta=2, n=m, sd_dist=pm.Exponential.dist(1.0, shape=m), compute_corr=True ) cov = pm.Deterministic('cov', chol.dot(chol.T)) mu = pm.StudentT('mu', mu=0, nu=2, shape=m) obs = pm.MvNormal("obs", mu=mu, chol=chol, observed=mvdata, shape=(n,m))
When I sample using 4 chains and 1 core, it goes quickly and smoothly (other than the
self.vm() warning that I don’t understand).
with model: post = pm.sample(chains=4, cores=1)
But when I switch to 4 cores, it takes (comparatively) forever:
with model: post = pm.sample(chains=4, cores=4)
That’s about 54 times longer using 4 times as many cores…?
I’m not sure how to even start diagnosing this issue, as I’m totally unfamiliar with the backend libraries. My real application is a much larger dataset though, so I want to understand how to get the best performance. Any insights?
The package versions:
numpy version: 1.24.3 scipy version: 1.10.1 pymc version: 5.6.1 arviz version: 0.16.1