Slow Sampling in Virtual Environment

Hi,

I have the following model with the df shape being (72, 39) and the us_rgdp shape being (72,). Yet it takes a long time to sample it, around 10-15 minutes. I use to run anaconda environment in VSCode and sampling which be much quicker, but now I am using a virtual environment and I suppose that is why its taking much longer. Are there some packages I need to install so that the sampling runs much faster? I am using windows 10 computer if that information is needed.

with pm.Model(coords=coords) as laplace_model:
    x = pm.MutableData("x", df[:-8].values)
    y = pm.MutableData("y", us_rgdp[:-8])
    beta = pm.Laplace('beta', mu=0, b=0.1, dims=['factors'])
    alpha = pm.Normal('alpha', mu=0, sigma=0.1)

    mu = pm.Deterministic('mu', alpha + x @ beta)
    
    sigma = pm.HalfCauchy('sigma', beta=0.1)
    
    obs = pm.Normal('obs', mu=mu, sigma=sigma, observed=y, shape=mu.shape)
    
    idata = pm.sample(idata_kwargs={'log_likelihood':True})

Thanks

When I make fake data and run your model as written, it samples extremely fast.

I would check the shapes of all your variables to make sure they are what you expect you are. pm.model_to_graphviz is convenient for doing this. If that’s not the problem, your priors might be too narrow. Standardizing the input data (df in your model) often helps as well (see here for a discussion on scaling/centering and priors)

1 Like

Thanks, I expanded my priors and it didn’t help. Do I need to standardize my input data if they are currently year over year percentage changes? I tried to use the pm.model_to_graphviz but all I get is a graphviz.graphs.Digraph object which I can’t show.

Can you give me your fake data code that sampled extremely fast and I’ll run the same model you had and see if I get the same problem because I think it is related to the software/machine and not the model itself. I ran it again to time it and it actually takes 48 minutes.

I just did df = np.random.normal(size=(72,39)) and us_rgdp=np.random.normal(size=(72,))

Ok I ran this and it took 3 minutes and a half. Does that sound reasonable or should it be quicker than that? It is much quicker than the 48 minutes my data took.

That’s still quite long, it was about 10 seconds on my machine. Knowing nothing at all about your environment, my stab in the dark would be that that BLAS is not properly configured. You could try to run python -m pytensor.misc.check_blas in the terminal to do a speed check. The important line to check for is blas__ldflags. If it’s empty, it means you’re using numpy for linear algebra, which will cause significant slowdown.

It is blank:

Some PyTensor flags:
    blas__ldflags=
    compiledir= C:\Users\Matt\AppData\Local\PyTensor\compiledir_Windows-10-10.0.19045-SP0-Intel64_Family_6_Model_158_Stepping_10_GenuineIntel-3.11.1-64
    floatX= float64
    device= cpu

Any idea how to fix that?

What kind of virtual environment are you using, and how many resources have you allocated to it? Often a virtual environment only has access to some fraction of the CPU resources of the host machine. Having less processing power available would be expected to reduce your sampling speed.

Whoops, realized you almost certainly mean a Python venv, and not a virtual machine.

1 Like

On windows (I’m guessing from the compiledir) it probably means that you’re missing some MKL packages. How did you install pymc? If you did the recommended conda install pymc>=5 you should get everything you need automatically.

Yes, its a Python venv

I don’t have anaconda anymore so I used pip install pymc to install, I believe.

If you pip install you need to install the extra stuff yourself. Try installing mkl-service>=2.3.0 and m2w64-toolchain and blas

Mamba is the recommended way to install, though.

Yeah, these packages won’t install with pip. I didn’t want to install anaconda because it takes up so much space and I thought I could get by with just using pip. Perhaps mamba is a better alternative

I hated conda, but strongly recommend mamba

if I am currently using a venv for my project, would I install mamba and then install these packages (mkl-service>=2.3.0, m2w64-toolchain, blas) using mamba in the same venv?

I don’t know anything about how a venv works, sorry. I’ve read that you shouldn’t mix conda and pip if possible, though. If it were me, I’d install mamba and make a fresh environment.

If I use mamba do I still need to install these packages separately or will installing pymc do the trick?

You should just be able to do mamba install "pymc>=5" and you’ll be done. Or mamba create -n pymc_env "pymc >=5" (or replace pymc_env with name you like) if you want to make a new environment and install pymc into it in one go

Conda is the officially recommended method of installation (instructions here) specifically to avoid the issues you are encountering. If you wish to avoid the bloat of conda, you can try miniconda. Or you can try micromamba if you are interested in trying out the mamba package/environment manager.

1 Like

Thanks, the problem is solved. It now runs quickly.

2 Likes