LLVM ERROR: Unable to allocate section memory!

nuwanda · March 8, 2024, 8:09am

Hello,
I recently updated pymc3 to pymc 5 because sampling was very slow under pymc3. Under pymc 5.6.1, Using the NumPyro JAX NUTS sampler, my script runs into memory problems after a few iterations.
I have the following task: I have a dataset consisting of 30 entries. I want to fit a Bayesian regression with an underlying polynomial regression. My dataset consists of 30 records (x,y).
Unfortunately, I get the following error after some iterations:
LLVM ERROR: Unable to allocate section memory!
I suspect that the memory is not being cleared over several iterations of running the script. Does anyone know how I can fix this?
Below is my script:

    	poly = PolynomialFeatures(degree=2)
        x_poly = poly.fit_transform(x)
        sigma_obs = np.sqrt((1e-10 ** 4) + 0.01)
        with pytensor.config.change_flags(mode="NUMBA"):
            with pm.Model() as self.model:
                alpha = pm.Normal('alpha', mu=y[0][0], sigma=0.1)
                betas = pm.Normal('betas', mu=1, sigma=10, shape=(2 + 1,))
                sigma = pm.HalfNormal('sigma', sigma=1, testval=1.)
                mu = alpha + pm.math.dot(betas, x_poly.T)
                y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma_obs, observed=y)
                self.trace = pm.sample(250, nuts_sampler="numpyro", tune=500, chains=5, target_accept=0.9, return_inferencedata=False, random_seed=42)

jessegrabowski · March 8, 2024, 8:49am

You don’t need to set the pytensor compile mode, that’s done automatically for you when you ask for a non-standard sampler. There might be something funny going on because you’re asking for numba mode, but then using numpyro which is jax-based. Actually different optimizations are applied to your model based on the backend, so this might end up mattering.

Also LLVM is the JIT compiler (transpiler?) that numba uses (JAX uses XLA) so it makes me suspect this line.

nuwanda · March 8, 2024, 9:08am

Thanks, for this hint. I removed this line and restarted my experiments. After some time I got this error then:

AssertionError: Key not found in unpickled KeyData file. Verify the __eq__ and __hash__ functions of your Ops. The file is: X/X/X,key.pkl. the key is (((14, (3, (4,), (4,), (4,), (4,), (4,), (4,)), (13, '1.25.2'), (13, '1.25.2'), (13, '1.25.2'), (13, '1.25.2'), (13, '1.25.2'), (13, '1.25.2'), (13, '1.25.2'), ('openmp', False), ('openmp_elemwise_minsize', 200000)), ('scalar_op', 'inplace_pattern'), (11, 13, '1.25.2'), (11, 13, '1.25.2'), (11, 13, '1.25.2'), (11, 13, '1.25.2'), (11, 13, '1.25.2'), (11, 13, '1.25.2'), (11, 13, '1.25.2')), ('CLinker.cmodule_key', ('--param', '--param', '--param', '-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION', '-O3', '-Wno-c++11-narrowing', '-Wno-unused-label', '-Wno-unused-variable', '-Wno-write-strings', '-fPIC', '-fno-asynchronous-unwind-tables', '-fno-exceptions', '-fno-math-errno', '-fno-unwind-tables', '-m64', '-mabm', '-madx', '-maes', '-march=cascadelake', '-mavx', '-mavx2', '-mavx512bw', '-mavx512cd', '-mavx512dq', '-mavx512f', '-mavx512vl', '-mavx512vnni', '-mbmi', '-mbmi2', '-mclflushopt', '-mclwb', '-mcx16', '-mf16c', '-mfma', '-mfsgsbase', '-mfxsr', '-mlzcnt', '-mmmx', '-mmovbe', '-mno-3dnow', '-mno-avx5124fmaps', '-mno-avx5124vnniw', '-mno-avx512bitalg', '-mno-avx512er', '-mno-avx512ifma', '-mno-avx512pf', '-mno-avx512vbmi', '-mno-avx512vbmi2', '-mno-avx512vpopcntdq', '-mno-cldemote', '-mno-clzero', '-mno-fma4', '-mno-gfni', '-mno-hle', '-mno-lwp', '-mno-movdir64b', '-mno-movdiri', '-mno-mwaitx', '-mno-pconfig', '-mno-prefetchwt1', '-mno-ptwrite', '-mno-rdpid', '-mno-rtm', '-mno-sgx', '-mno-sha', '-mno-shstk', '-mno-sse4a', '-mno-tbm', '-mno-vaes', '-mno-vpclmulqdq', '-mno-waitpkg', '-mno-wbnoinvd', '-mno-xop', '-mpclmul', '-mpku', '-mpopcnt', '-mprfchw', '-mrdrnd', '-mrdseed', '-msahf', '-msse', '-msse2', '-msse3', '-msse4.1', '-msse4.2', '-mssse3', '-mtune=cascadelake', '-mxsave', '-mxsavec', '-mxsaveopt', '-mxsaves', 'l1-cache-line-size=64', 'l1-cache-size=32', 'l2-cache-size=25344'), (), (), 'NPY_ABI_VERSION=0x1000009', 'c_compiler_str=/usr/bin/g++ 9', 'md5:m624562459f98c9fea8456c836a7c615e5d233a9e9905eebffeb353cd3b5f9076', (Elemwise(scalar_op=Composite{(((i3 * sqr(((i0 - i1) / i2))) - i4) - i5)},inplace_pattern=<frozendict {}>), ((TensorType(float64, shape=(30, 1)), (('med46277cfc94a2022d5c10b4ff11cc968ce1252bcef9ba480754734791ba582c', 0, 0), False)), (TensorType(float64, shape=(1, 30)), ((-1, 1), False)), (TensorType(float64, shape=(30, 1)), (('m34bdfbdba23987b235e706b6a0d15a73b0cab262dde51906583f1ad63fdcebd5', 0, 2), False)), (TensorType(float64, shape=(1, 1)), (('ma7ad8e0919c1e90250fff6895a17ab491413127adac0196cbaf0f9e3a7c1985c', 0, 3), False)), (TensorType(float64, shape=(1, 1)), (('m28ae71fd0172e8a6a2a4845226c2f780c7a9b481b6e15913a64c50e2674aab0c', 0, 4), False)), (TensorType(float64, shape=(30, 1)), (('md1a718550ff84b74c2321bb05c48df6dec3cae066766722cba912b5bf2b83e0c', 0, 5), False))), (1, (False,)))))
Apply node that caused the error: Composite{(((i3 * sqr(((i0 - i1) / i2))) - i4) - i5)}(y_obs{[[0.687196 ... 66658728]]}, Add.0, [[0.1]
 [0 ... 1]
 [0.1]], [[-0.5]], [[0.91893853]], [[-2.30258 ... 30258507]])
Toposort index: 9
Inputs types: [TensorType(float64, shape=(30, 1)), TensorType(float64, shape=(1, 30)), TensorType(float64, shape=(30, 1)), TensorType(float64, shape=(1, 1)), TensorType(float64, shape=(1, 1)), TensorType(float64, shape=(30, 1))]

HINT: Use a linker other than the C linker to print the inputs' shapes and strides.
HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

I am not sure what this means.

Best

jessegrabowski · March 8, 2024, 9:11am

Did you make a clean environment when you upgraded from pymc3 to 5? This looks like a really exotic error that is likely the result of a messy installation

nuwanda · March 8, 2024, 3:50pm

I created a clean environment, I updated a majority of my packages, but still the same error appears.

jessegrabowski · March 9, 2024, 11:37am

Can you share a minimal code snippet I can run locally that reproduces the issue?

nuwanda · March 11, 2024, 8:06am

Hey!
This should be a good minimal example:

        for iteration in range(1000):
            print(f"iteration: {iteration}")
            x = np.repeat([[0.], [0.01], [0.02]], 10, axis=0)
            y = np.random.random((30, 1))
            poly = PolynomialFeatures(degree=2)
            x_poly = poly.fit_transform(x)
            sigma_obs = np.sqrt((1e-10 ** 4) + 0.01)
            with pm.Model():
                alpha = pm.Normal('alpha', mu=y[0][0], sigma=0.1)
                betas = pm.Normal('betas', mu=1, sigma=10, shape=(2 + 1,))
                sigma = pm.HalfNormal('sigma', sigma=1, testval=1.)
                mu = alpha + pm.math.dot(betas, x_poly.T)
                y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma_obs, observed=y)
                trace = pm.sample(250, nuts_sampler="numpyro", tune=500, chains=5, target_accept=0.9,
                                       return_inferencedata=False, random_seed=42)

                for x_level in [0.0, 0.01, 0.02]:
                    x_poly = poly.transform(np.array(x_level).reshape(-1, 1))
                    alpha_samples = trace.posterior['alpha'].values.flatten()
                    beta_samples = trace.posterior['betas'].values.reshape(-1, 2 + 1)
                    y_preds = np.dot(beta_samples, x_poly.T) + alpha_samples[:, None]
                    mean_prediction = np.mean(y_preds, axis=0)
                    lower_bound, upper_bound = np.percentile(y_preds, [2.5, 97.5], axis=0)
                    print(f"mean_prediction: {mean_prediction[0]}, lower_bound: {lower_bound[0]}, upper_bound: {upper_bound[0]}")

ricardoV94 · March 11, 2024, 1:02pm

Can you include the whole code needed to run that example, including imports and any custom functions/classes?

nuwanda · March 11, 2024, 8:22pm

Sure!

from sklearn.preprocessing import PolynomialFeatures
import numpy as np
import pymc as pm
import os
os.environ["XLA_PYTHON_CLIENT_PREALLOCATE"]="false"
os.environ["XLA_PYTHON_CLIENT_ALLOCATOR"]="platform"
for iteration in range(1000):
  print(f"iteration: {iteration}")
  x = np.repeat([[0.], [0.01], [0.02]], 10, axis=0)
  y = np.random.random((30, 1))
  poly = PolynomialFeatures(degree=2)
  x_poly = poly.fit_transform(x)
  sigma_obs = np.sqrt((1e-10 ** 4) + 0.01)
  with pm.Model():
      alpha = pm.Normal('alpha', mu=y[0][0], sigma=0.1)
      betas = pm.Normal('betas', mu=1, sigma=10, shape=(2 + 1,))
      sigma = pm.HalfNormal('sigma', sigma=1, testval=1.)
      mu = alpha + pm.math.dot(betas, x_poly.T)
      y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma_obs, observed=y)
      trace = pm.sample(250, nuts_sampler="numpyro", tune=500, chains=5, target_accept=0.9,
                             return_inferencedata=False, random_seed=42)
  
      for x_level in [0.0, 0.01, 0.02]:
          x_poly = poly.transform(np.array(x_level).reshape(-1, 1))
          alpha_samples = trace.posterior['alpha'].values.flatten()
          beta_samples = trace.posterior['betas'].values.reshape(-1, 2 + 1)
          y_preds = np.dot(beta_samples, x_poly.T) + alpha_samples[:, None]
          mean_prediction = np.mean(y_preds, axis=0)
          lower_bound, upper_bound = np.percentile(y_preds, [2.5, 97.5], axis=0)
          print(f"mean_prediction: {mean_prediction[0]}, lower_bound: {lower_bound[0]}, upper_bound: {upper_bound[0]}")

I have to add that I am running this on a compute cluster and managing the resources by slurm. I run multiple of these jobs in parallel. Currently, I am checking whether this error also occurs on my local machine, while running just one job.

ricardoV94 · March 11, 2024, 9:45pm

The cluster part could be important: Issues with PyMC Execution using Snakemake: PyTensor Errors - #3 by lucianopaz

If the C backend is giving you trouble you can also disable it altogether with pytensor.config.cxx=""

nuwanda · March 12, 2024, 6:02pm

It seems to work now. I added the following line from your linked thread to my slurm script:

export PYTENSOR_FLAGS="compiledir=$HOME/.pytensor/compiledir_$(uuidgen)"

Over the day I started several experiments in parallel and I got no errors. Thank you very much for your support!

Topic		Replies	Views
Memory allocation limit for NUTS with custom `logp` function (but not with VI methods) v5 modeling	39	1232	December 10, 2022
Batch process capability for pymc.sampling_jax.sample_numpyro_nuts() with GPU? v5 modeling	3	550	September 12, 2022
MemeoryError when sampling Questions	2	538	July 12, 2018
Memory error when going from Spyder to Jupyter Notebook Questions	0	1401	September 24, 2018
Reduce memory requirements on the GPU when sampling with pm.sampling_jax.sample_numpyro_nuts() v5 gpu	3	1150	March 15, 2023

LLVM ERROR: Unable to allocate section memory!

Related topics