OverflowError: Python int too large to convert to C long

ccyang · October 20, 2025, 12:51pm

I’m stuck on an error. I’m looking for a solution, and I also hope to use this case to learn how to debug PyMC models.

The following is a small BART model that reproduces an OverflowError. I encountered the same issue with another model and have found several similar questions, e.g.

The other posts don’t contain any data or code to replicate the issue. I’m hoping this small model can be useful in dealing with the problem.

The data I used is in the following file. And the code reads:

import pymc as pm
import pymc_bart as pmb
import numpy as np

data = np.loadtxt("../data/data.csv", delimiter=",")
print(data.shape)
X = data[:, :-1]
Y = data[:, -1]
print(X.shape, Y.shape)

n_obs = len(Y)

rng = np.random.default_rng(42)

with pm.Model() as model:
    w = pmb.BART("w", X=X, Y=Y, m=100)
    mean = pm.Deterministic("mean", w)
    sigma = pm.HalfNormal("sigma", sigma=0.1)
    y = pm.TruncatedNormal(
        'y',
        mu=mean,
        sigma=sigma,
        lower=-0.1,
        upper=0.1, observed=Y
    )

    idata = pm.sample(random_seed=rng, compute_convergence_checks=False)

data.csv (29.5 KB)

The traceback I got reads:

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
Cell In[40], line 13
      4 sigma = pm.HalfNormal("sigma", sigma=0.1)
      5 y = pm.TruncatedNormal(
      6     'y',
      7     mu=mean,
   (...)     10     upper=0.1, observed=Y
     11 )
---> 13 idata = pm.sample(random_seed=rng, compute_convergence_checks=False)

File ~/.local/share/virtualenvs/MachineLearning-4jP8FBMK/lib/python3.12/site-packages/pymc/sampling/mcmc.py:928, in sample(draws, tune, chains, cores, random_seed, progressbar, progressbar_theme, step, var_names, nuts_sampler, initvals, init, jitter_max_retries, n_init, trace, discard_tuned_samples, compute_convergence_checks, keep_warning_stat, return_inferencedata, idata_kwargs, nuts_sampler_kwargs, callback, mp_ctx, blas_cores, model, compile_kwargs, **kwargs)
    926 _print_step_hierarchy(step)
    927 try:
--> 928     _mp_sample(**sample_args, **parallel_args)
    929 except pickle.PickleError:
    930     _log.warning("Could not pickle model, sampling singlethreaded.")

File ~/.local/share/virtualenvs/MachineLearning-4jP8FBMK/lib/python3.12/site-packages/pymc/sampling/mcmc.py:1408, in _mp_sample(draws, tune, step, chains, cores, rngs, start, progressbar, progressbar_theme, traces, model, callback, blas_cores, mp_ctx, **kwargs)
   1405 strace = traces[draw.chain]
   1406 if not zarr_recording:
   1407     # Zarr recording happens in each process
-> 1408     strace.record(draw.point, draw.stats)
   1409 log_warning_stats(draw.stats)
   1411 if callback is not None:

File ~/.local/share/virtualenvs/MachineLearning-4jP8FBMK/lib/python3.12/site-packages/pymc/backends/ndarray.py:116, in NDArray.record(self, point, sampler_stats)
    114     for data, vars in zip(self._stats, sampler_stats):
    115         for key, val in vars.items():
--> 116             data[key][draw_idx] = val
    117 elif self._stats is not None:
    118     raise ValueError("Expected sampler_stats")

OverflowError: Python int too large to convert to C long

My environment specifics:

Ubuntu 24.04
Python 3.12
pymc 5.25.1
pymc-bart 0.10.0

Thanks a lot in advance!

ricardoV94 · October 20, 2025, 2:40pm

Some statistic uses python integer that’s larger than what int64 can encode

ricardoV94 · October 20, 2025, 2:42pm

Would be great to see what that key, val pair is when it fails

ccyang · October 20, 2025, 2:47pm

What should I do to get the pair?

ccyang · October 20, 2025, 2:51pm

I guess the output of BART w goes very large, but I don’t know how to check or apply prior knowledge on it, e.g., limit w within a range.

ricardoV94 · October 20, 2025, 2:59pm

You could try to go into an interactive debugger. Simplest approach is to edit the source code and change it to something like:

try:
    data[key][draw_idx] = val
except OverflowError:
    print(f"Writing summary statistic failed for {key=}, {val=}")
    raise

The python file is located in ~/.local/share/virtualenvs/MachineLearning-4jP8FBMK/lib/python3.12/site-packages/pymc/backends/ndarray.py, and the code at line 116 for you

ricardoV94 · October 20, 2025, 3:01pm

CC @aloctavodia while we are at it

ccyang · October 21, 2025, 12:49am

Here is what I got:

Writing summary statistic failed for key='variable_inclusion', val=124116368418827357883

And the traceback:

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
Cell In[3], line 13
      4 sigma = pm.HalfNormal("sigma", sigma=0.1)
      5 y = pm.TruncatedNormal(
      6     'y',
      7     mu=mean,
   (...)     10     upper=0.1, observed=Y
     11 )
---> 13 idata = pm.sample(random_seed=rng, compute_convergence_checks=False)

File ~/.local/share/virtualenvs/MachineLearning-4jP8FBMK/lib/python3.12/site-packages/pymc/sampling/mcmc.py:928, in sample(draws, tune, chains, cores, random_seed, progressbar, progressbar_theme, step, var_names, nuts_sampler, initvals, init, jitter_max_retries, n_init, trace, discard_tuned_samples, compute_convergence_checks, keep_warning_stat, return_inferencedata, idata_kwargs, nuts_sampler_kwargs, callback, mp_ctx, blas_cores, model, compile_kwargs, **kwargs)
    926 _print_step_hierarchy(step)
    927 try:
--> 928     _mp_sample(**sample_args, **parallel_args)
    929 except pickle.PickleError:
    930     _log.warning("Could not pickle model, sampling singlethreaded.")

File ~/.local/share/virtualenvs/MachineLearning-4jP8FBMK/lib/python3.12/site-packages/pymc/sampling/mcmc.py:1408, in _mp_sample(draws, tune, step, chains, cores, rngs, start, progressbar, progressbar_theme, traces, model, callback, blas_cores, mp_ctx, **kwargs)
   1405 strace = traces[draw.chain]
   1406 if not zarr_recording:
   1407     # Zarr recording happens in each process
-> 1408     strace.record(draw.point, draw.stats)
   1409 log_warning_stats(draw.stats)
   1411 if callback is not None:

File ~/.local/share/virtualenvs/MachineLearning-4jP8FBMK/lib/python3.12/site-packages/pymc/backends/ndarray.py:117, in NDArray.record(self, point, sampler_stats)
    115 for key, val in vars.items():
    116     try:
--> 117         data[key][draw_idx] = val
    118     except OverflowError:
    119         print(f"Writing summary statistic failed for {key=}, {val=}")

OverflowError: Python int too large to convert to C long

@aloctavodia

ccyang · October 21, 2025, 5:31am

Hey everyone, just checking in on the BART model issue. Any sense of when we might have a fix for this?

I’m trying to figure out my next steps—if it’ll be sorted in a few days, I’m happy to wait. But if it’s going to be longer, I might jump in and try implementing it in another language to keep things moving.

If I go that route, what do you think would be our best bet? TensorFlow Probability? Stan? Or something else entirely? I’d love to get your thoughts.

Thanks a lot!

aloctavodia · October 21, 2025, 6:17am

The issue is with the variable inclusion. We used to store it as a list of vectors (or something similar), but recently we switched to integer encoding, this is fine if we keep it as an integer, but we get the error when converted to int64. Not sure how I missed this earlier. We should go back to the previous way of storing the variable inclusion or think of a better alternative.

ricardoV94 · October 21, 2025, 6:45am

You could store it in a numpy object array (instead of int64 array). Not sure if arviz will be happy about it when we convert to InferenceData

ricardoV94 · October 21, 2025, 6:48am

You can temporarily patch it locally with a similar code you used for debugging. Just pass instead of re-raising the error. Summary statistics aren’t needed for sampling, they’re just there for the user afterwards.

try:
    data[key][draw_idx] = val
except OverflowError:
    pass  # Or store as -1 to be identifiable later

aloctavodia · October 21, 2025, 6:49am

That what we used to do, it works, but then created some issue when saving the inferencedata/netcdf to disk.

aloctavodia · October 21, 2025, 8:11am

@ccyang you can try installing pymc-bart directly from github `pip install git+https://github.com/pymc-devs/pymc-bart.git`, a new release will be ready soon.

ccyang · October 21, 2025, 11:47am

Upgrade to pymc-bart 0.11.0 solved this issue. Thank you guys!

Topic		Replies	Views
Overflow issues with PyMC-BART version agnostic bart	10	178	October 22, 2025
OverflowError: Results too large v5	6	152	July 28, 2024
EOF Error with PyMC BART on M2 Mac bart	2	448	August 22, 2023
[HMC] OverflowError: Python int too large to convert to C long Questions	7	4393	December 17, 2018
PyTensor Problem with PyMC-BART in Google Colab? bart	4	219	September 26, 2024

OverflowError: Python int too large to convert to C long

Related topics