LLVM ERROR: Unable to allocate section memory!

Hello,
I recently updated pymc3 to pymc 5 because sampling was very slow under pymc3. Under pymc 5.6.1, Using the NumPyro JAX NUTS sampler, my script runs into memory problems after a few iterations.
I have the following task: I have a dataset consisting of 30 entries. I want to fit a Bayesian regression with an underlying polynomial regression. My dataset consists of 30 records (x,y).
Unfortunately, I get the following error after some iterations:
LLVM ERROR: Unable to allocate section memory!
I suspect that the memory is not being cleared over several iterations of running the script. Does anyone know how I can fix this?
Below is my script:

    	poly = PolynomialFeatures(degree=2)
        x_poly = poly.fit_transform(x)
        sigma_obs = np.sqrt((1e-10 ** 4) + 0.01)
        with pytensor.config.change_flags(mode="NUMBA"):
            with pm.Model() as self.model:
                alpha = pm.Normal('alpha', mu=y[0][0], sigma=0.1)
                betas = pm.Normal('betas', mu=1, sigma=10, shape=(2 + 1,))
                sigma = pm.HalfNormal('sigma', sigma=1, testval=1.)
                mu = alpha + pm.math.dot(betas, x_poly.T)
                y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma_obs, observed=y)
                self.trace = pm.sample(250, nuts_sampler="numpyro", tune=500, chains=5, target_accept=0.9, return_inferencedata=False, random_seed=42)

You don’t need to set the pytensor compile mode, that’s done automatically for you when you ask for a non-standard sampler. There might be something funny going on because you’re asking for numba mode, but then using numpyro which is jax-based. Actually different optimizations are applied to your model based on the backend, so this might end up mattering.

Also LLVM is the JIT compiler (transpiler?) that numba uses (JAX uses XLA) so it makes me suspect this line.

Thanks, for this hint. I removed this line and restarted my experiments. After some time I got this error then:

AssertionError: Key not found in unpickled KeyData file. Verify the __eq__ and __hash__ functions of your Ops. The file is: X/X/X,key.pkl. the key is (((14, (3, (4,), (4,), (4,), (4,), (4,), (4,)), (13, '1.25.2'), (13, '1.25.2'), (13, '1.25.2'), (13, '1.25.2'), (13, '1.25.2'), (13, '1.25.2'), (13, '1.25.2'), ('openmp', False), ('openmp_elemwise_minsize', 200000)), ('scalar_op', 'inplace_pattern'), (11, 13, '1.25.2'), (11, 13, '1.25.2'), (11, 13, '1.25.2'), (11, 13, '1.25.2'), (11, 13, '1.25.2'), (11, 13, '1.25.2'), (11, 13, '1.25.2')), ('CLinker.cmodule_key', ('--param', '--param', '--param', '-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION', '-O3', '-Wno-c++11-narrowing', '-Wno-unused-label', '-Wno-unused-variable', '-Wno-write-strings', '-fPIC', '-fno-asynchronous-unwind-tables', '-fno-exceptions', '-fno-math-errno', '-fno-unwind-tables', '-m64', '-mabm', '-madx', '-maes', '-march=cascadelake', '-mavx', '-mavx2', '-mavx512bw', '-mavx512cd', '-mavx512dq', '-mavx512f', '-mavx512vl', '-mavx512vnni', '-mbmi', '-mbmi2', '-mclflushopt', '-mclwb', '-mcx16', '-mf16c', '-mfma', '-mfsgsbase', '-mfxsr', '-mlzcnt', '-mmmx', '-mmovbe', '-mno-3dnow', '-mno-avx5124fmaps', '-mno-avx5124vnniw', '-mno-avx512bitalg', '-mno-avx512er', '-mno-avx512ifma', '-mno-avx512pf', '-mno-avx512vbmi', '-mno-avx512vbmi2', '-mno-avx512vpopcntdq', '-mno-cldemote', '-mno-clzero', '-mno-fma4', '-mno-gfni', '-mno-hle', '-mno-lwp', '-mno-movdir64b', '-mno-movdiri', '-mno-mwaitx', '-mno-pconfig', '-mno-prefetchwt1', '-mno-ptwrite', '-mno-rdpid', '-mno-rtm', '-mno-sgx', '-mno-sha', '-mno-shstk', '-mno-sse4a', '-mno-tbm', '-mno-vaes', '-mno-vpclmulqdq', '-mno-waitpkg', '-mno-wbnoinvd', '-mno-xop', '-mpclmul', '-mpku', '-mpopcnt', '-mprfchw', '-mrdrnd', '-mrdseed', '-msahf', '-msse', '-msse2', '-msse3', '-msse4.1', '-msse4.2', '-mssse3', '-mtune=cascadelake', '-mxsave', '-mxsavec', '-mxsaveopt', '-mxsaves', 'l1-cache-line-size=64', 'l1-cache-size=32', 'l2-cache-size=25344'), (), (), 'NPY_ABI_VERSION=0x1000009', 'c_compiler_str=/usr/bin/g++ 9', 'md5:m624562459f98c9fea8456c836a7c615e5d233a9e9905eebffeb353cd3b5f9076', (Elemwise(scalar_op=Composite{(((i3 * sqr(((i0 - i1) / i2))) - i4) - i5)},inplace_pattern=<frozendict {}>), ((TensorType(float64, shape=(30, 1)), (('med46277cfc94a2022d5c10b4ff11cc968ce1252bcef9ba480754734791ba582c', 0, 0), False)), (TensorType(float64, shape=(1, 30)), ((-1, 1), False)), (TensorType(float64, shape=(30, 1)), (('m34bdfbdba23987b235e706b6a0d15a73b0cab262dde51906583f1ad63fdcebd5', 0, 2), False)), (TensorType(float64, shape=(1, 1)), (('ma7ad8e0919c1e90250fff6895a17ab491413127adac0196cbaf0f9e3a7c1985c', 0, 3), False)), (TensorType(float64, shape=(1, 1)), (('m28ae71fd0172e8a6a2a4845226c2f780c7a9b481b6e15913a64c50e2674aab0c', 0, 4), False)), (TensorType(float64, shape=(30, 1)), (('md1a718550ff84b74c2321bb05c48df6dec3cae066766722cba912b5bf2b83e0c', 0, 5), False))), (1, (False,)))))
Apply node that caused the error: Composite{(((i3 * sqr(((i0 - i1) / i2))) - i4) - i5)}(y_obs{[[0.687196 ... 66658728]]}, Add.0, [[0.1]
 [0 ... 1]
 [0.1]], [[-0.5]], [[0.91893853]], [[-2.30258 ... 30258507]])
Toposort index: 9
Inputs types: [TensorType(float64, shape=(30, 1)), TensorType(float64, shape=(1, 30)), TensorType(float64, shape=(30, 1)), TensorType(float64, shape=(1, 1)), TensorType(float64, shape=(1, 1)), TensorType(float64, shape=(30, 1))]

HINT: Use a linker other than the C linker to print the inputs' shapes and strides.
HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

I am not sure what this means.

Best

Did you make a clean environment when you upgraded from pymc3 to 5? This looks like a really exotic error that is likely the result of a messy installation

I created a clean environment, I updated a majority of my packages, but still the same error appears.

Can you share a minimal code snippet I can run locally that reproduces the issue?

Hey!
This should be a good minimal example:

        for iteration in range(1000):
            print(f"iteration: {iteration}")
            x = np.repeat([[0.], [0.01], [0.02]], 10, axis=0)
            y = np.random.random((30, 1))
            poly = PolynomialFeatures(degree=2)
            x_poly = poly.fit_transform(x)
            sigma_obs = np.sqrt((1e-10 ** 4) + 0.01)
            with pm.Model():
                alpha = pm.Normal('alpha', mu=y[0][0], sigma=0.1)
                betas = pm.Normal('betas', mu=1, sigma=10, shape=(2 + 1,))
                sigma = pm.HalfNormal('sigma', sigma=1, testval=1.)
                mu = alpha + pm.math.dot(betas, x_poly.T)
                y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma_obs, observed=y)
                trace = pm.sample(250, nuts_sampler="numpyro", tune=500, chains=5, target_accept=0.9,
                                       return_inferencedata=False, random_seed=42)

                for x_level in [0.0, 0.01, 0.02]:
                    x_poly = poly.transform(np.array(x_level).reshape(-1, 1))
                    alpha_samples = trace.posterior['alpha'].values.flatten()
                    beta_samples = trace.posterior['betas'].values.reshape(-1, 2 + 1)
                    y_preds = np.dot(beta_samples, x_poly.T) + alpha_samples[:, None]
                    mean_prediction = np.mean(y_preds, axis=0)
                    lower_bound, upper_bound = np.percentile(y_preds, [2.5, 97.5], axis=0)
                    print(f"mean_prediction: {mean_prediction[0]}, lower_bound: {lower_bound[0]}, upper_bound: {upper_bound[0]}")

Can you include the whole code needed to run that example, including imports and any custom functions/classes?

2 Likes

Sure!

from sklearn.preprocessing import PolynomialFeatures
import numpy as np
import pymc as pm
import os
os.environ["XLA_PYTHON_CLIENT_PREALLOCATE"]="false"
os.environ["XLA_PYTHON_CLIENT_ALLOCATOR"]="platform"
for iteration in range(1000):
  print(f"iteration: {iteration}")
  x = np.repeat([[0.], [0.01], [0.02]], 10, axis=0)
  y = np.random.random((30, 1))
  poly = PolynomialFeatures(degree=2)
  x_poly = poly.fit_transform(x)
  sigma_obs = np.sqrt((1e-10 ** 4) + 0.01)
  with pm.Model():
      alpha = pm.Normal('alpha', mu=y[0][0], sigma=0.1)
      betas = pm.Normal('betas', mu=1, sigma=10, shape=(2 + 1,))
      sigma = pm.HalfNormal('sigma', sigma=1, testval=1.)
      mu = alpha + pm.math.dot(betas, x_poly.T)
      y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma_obs, observed=y)
      trace = pm.sample(250, nuts_sampler="numpyro", tune=500, chains=5, target_accept=0.9,
                             return_inferencedata=False, random_seed=42)
  
      for x_level in [0.0, 0.01, 0.02]:
          x_poly = poly.transform(np.array(x_level).reshape(-1, 1))
          alpha_samples = trace.posterior['alpha'].values.flatten()
          beta_samples = trace.posterior['betas'].values.reshape(-1, 2 + 1)
          y_preds = np.dot(beta_samples, x_poly.T) + alpha_samples[:, None]
          mean_prediction = np.mean(y_preds, axis=0)
          lower_bound, upper_bound = np.percentile(y_preds, [2.5, 97.5], axis=0)
          print(f"mean_prediction: {mean_prediction[0]}, lower_bound: {lower_bound[0]}, upper_bound: {upper_bound[0]}")

I have to add that I am running this on a compute cluster and managing the resources by slurm. I run multiple of these jobs in parallel. Currently, I am checking whether this error also occurs on my local machine, while running just one job.

The cluster part could be important: Issues with PyMC Execution using Snakemake: PyTensor Errors - #3 by lucianopaz

If the C backend is giving you trouble you can also disable it altogether with pytensor.config.cxx=""

1 Like

It seems to work now. I added the following line from your linked thread to my slurm script:

export PYTENSOR_FLAGS="compiledir=$HOME/.pytensor/compiledir_$(uuidgen)"

Over the day I started several experiments in parallel and I got no errors. Thank you very much for your support!

3 Likes