Slow Sampling in Virtual Environment

illmattic · May 8, 2023, 9:39am

Hi,

I have the following model with the df shape being (72, 39) and the us_rgdp shape being (72,). Yet it takes a long time to sample it, around 10-15 minutes. I use to run anaconda environment in VSCode and sampling which be much quicker, but now I am using a virtual environment and I suppose that is why its taking much longer. Are there some packages I need to install so that the sampling runs much faster? I am using windows 10 computer if that information is needed.

with pm.Model(coords=coords) as laplace_model:
    x = pm.MutableData("x", df[:-8].values)
    y = pm.MutableData("y", us_rgdp[:-8])
    beta = pm.Laplace('beta', mu=0, b=0.1, dims=['factors'])
    alpha = pm.Normal('alpha', mu=0, sigma=0.1)

    mu = pm.Deterministic('mu', alpha + x @ beta)
    
    sigma = pm.HalfCauchy('sigma', beta=0.1)
    
    obs = pm.Normal('obs', mu=mu, sigma=sigma, observed=y, shape=mu.shape)
    
    idata = pm.sample(idata_kwargs={'log_likelihood':True})

Thanks

jessegrabowski · May 8, 2023, 8:09pm

When I make fake data and run your model as written, it samples extremely fast.

I would check the shapes of all your variables to make sure they are what you expect you are. pm.model_to_graphviz is convenient for doing this. If that’s not the problem, your priors might be too narrow. Standardizing the input data (df in your model) often helps as well (see here for a discussion on scaling/centering and priors)

illmattic · May 9, 2023, 12:18pm

Thanks, I expanded my priors and it didn’t help. Do I need to standardize my input data if they are currently year over year percentage changes? I tried to use the pm.model_to_graphviz but all I get is a graphviz.graphs.Digraph object which I can’t show.

Can you give me your fake data code that sampled extremely fast and I’ll run the same model you had and see if I get the same problem because I think it is related to the software/machine and not the model itself. I ran it again to time it and it actually takes 48 minutes.

jessegrabowski · May 9, 2023, 1:36pm

I just did df = np.random.normal(size=(72,39)) and us_rgdp=np.random.normal(size=(72,))

illmattic · May 9, 2023, 2:05pm

Ok I ran this and it took 3 minutes and a half. Does that sound reasonable or should it be quicker than that? It is much quicker than the 48 minutes my data took.

jessegrabowski · May 9, 2023, 3:22pm

That’s still quite long, it was about 10 seconds on my machine. Knowing nothing at all about your environment, my stab in the dark would be that that BLAS is not properly configured. You could try to run python -m pytensor.misc.check_blas in the terminal to do a speed check. The important line to check for is blas__ldflags. If it’s empty, it means you’re using numpy for linear algebra, which will cause significant slowdown.

illmattic · May 9, 2023, 3:54pm

It is blank:

Some PyTensor flags:
    blas__ldflags=
    compiledir= C:\Users\Matt\AppData\Local\PyTensor\compiledir_Windows-10-10.0.19045-SP0-Intel64_Family_6_Model_158_Stepping_10_GenuineIntel-3.11.1-64
    floatX= float64
    device= cpu

Any idea how to fix that?

GodelBayes · May 9, 2023, 4:02pm

What kind of virtual environment are you using, and how many resources have you allocated to it? Often a virtual environment only has access to some fraction of the CPU resources of the host machine. Having less processing power available would be expected to reduce your sampling speed.

Whoops, realized you almost certainly mean a Python venv, and not a virtual machine.

jessegrabowski · May 9, 2023, 4:11pm

On windows (I’m guessing from the compiledir) it probably means that you’re missing some MKL packages. How did you install pymc? If you did the recommended conda install pymc>=5 you should get everything you need automatically.

illmattic · May 9, 2023, 5:06pm

Yes, its a Python venv

illmattic · May 9, 2023, 5:08pm

I don’t have anaconda anymore so I used pip install pymc to install, I believe.

jessegrabowski · May 9, 2023, 5:11pm

If you pip install you need to install the extra stuff yourself. Try installing mkl-service>=2.3.0 and m2w64-toolchain and blas

Mamba is the recommended way to install, though.

illmattic · May 9, 2023, 6:27pm

Yeah, these packages won’t install with pip. I didn’t want to install anaconda because it takes up so much space and I thought I could get by with just using pip. Perhaps mamba is a better alternative

jessegrabowski · May 9, 2023, 6:34pm

I hated conda, but strongly recommend mamba

illmattic · May 9, 2023, 7:08pm

if I am currently using a venv for my project, would I install mamba and then install these packages (mkl-service>=2.3.0, m2w64-toolchain, blas) using mamba in the same venv?

jessegrabowski · May 9, 2023, 7:24pm

I don’t know anything about how a venv works, sorry. I’ve read that you shouldn’t mix conda and pip if possible, though. If it were me, I’d install mamba and make a fresh environment.

illmattic · May 9, 2023, 10:46pm

If I use mamba do I still need to install these packages separately or will installing pymc do the trick?

jessegrabowski · May 9, 2023, 10:52pm

You should just be able to do mamba install "pymc>=5" and you’ll be done. Or mamba create -n pymc_env "pymc >=5" (or replace pymc_env with name you like) if you want to make a new environment and install pymc into it in one go

cluhmann · May 10, 2023, 12:45am

Conda is the officially recommended method of installation (instructions here) specifically to avoid the issues you are encountering. If you wish to avoid the bloat of conda, you can try miniconda. Or you can try micromamba if you are interested in trying out the mamba package/environment manager.

illmattic · May 10, 2023, 10:02am

Thanks, the problem is solved. It now runs quickly.

Topic		Replies	Views
Slow sampling speed with newer versions of PyMC v5 bug	39	1204	May 15, 2024
Importing pymc and sampling are slow on MacBook (I get blas warning) v5 installation	11	1142	August 4, 2023
Bayesian VAR example notebook: extremely low sampling rate	18	1424	May 23, 2023
Version dependant slowing down of Gaussian Mixture sampling in Ubuntu 20.04	44	741	November 7, 2023
How to increase sampling speed with pm.sample v5 modeling	7	2599	January 4, 2023

Slow Sampling in Virtual Environment

Related topics