RuntimeError: Unsupported type: ArrowStringArray during nutpie.sample

I am experiencing a RuntimeError when using nutpie.sample for a large-scale Beta Mixture Model (~460k points). The model compiles successfully using nutpie.compile_pymc_model(model), but the error triggers immediately at the sampling stage. I have already attempted to sanitize model.coords, but the issue persists.
I would appreciate any insights or guidance you could provide to help me resolve this issue and get the sampling process running.
Thanks.

Environment

  • PyMC version: 5.27.0

  • Nutpie version: 0.16.4

  • Python version: 3.11.14

  • PyTensor version:2.36.3

  • pyarrow version: 23.0.0

  • pandas version**:** 3.0.0

import os
import random
import sys
import numpy as np
import pymc as pm
import nutpie
import pymc.distributions.transforms as tr


scaled_data = np.random.beta(2, 2, size=4000000) #Simulated data
obs_idx = np.arange(len(scaled_data))


with pm.Model(coords={"obs_id": obs_idx}) as model:
        w = pm.Dirichlet("w", a=np.array([1,1,1]))
        mu = pm.Beta('mu', alpha=[1.5,7.5,13.5],beta=[13.5,7.5,1.5],shape=3, transform=tr.ordered, initval=[0.1, 0.5, 0.9])

        kappa = pm.Exponential("kappa", lam=0.1, shape=3)
        alphas = pm.Deterministic("alphas", mu * kappa)
        betas = pm.Deterministic("betas", (1 - mu) * kappa)

        components = pm.Beta.dist(alpha=alphas, beta=betas, shape=3)
        mixture = pm.Mixture('mixture', w=w, comp_dists=components, observed=scaled_data, dims = "obs_id")

        print("\n[INFO] Compiling model for Rust engine (Nutpie)...", flush=True)
        compiled_model = nutpie.compile_pymc_model(model)
        print(f"[INFO] Compilation complete", flush=True)
        trace = nutpie.sample(compiled_model)

Traceback (most recent call last):
  File "/home/jen96/project/PofO_Detection/99.Script/01.Pipeline_iDMR/Release2/02.HiFi/01.Revio/02.Diploid/04.Read_Classification/BMM_nutpie_test.py", line 28, in <module>
    trace = nutpie.sample(compiled_model)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jen96/miniconda3/envs/nutpie_env/lib/python3.11/site-packages/nutpie/sample.py", line 840, in sample
    sampler = _BackgroundSampler(
              ^^^^^^^^^^^^^^^^^^^
  File "/home/jen96/miniconda3/envs/nutpie_env/lib/python3.11/site-packages/nutpie/sample.py", line 501, in __init__
    self._sampler = compiled_model._make_sampler(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jen96/miniconda3/envs/nutpie_env/lib/python3.11/site-packages/nutpie/compile_pymc.py", line 151, in _make_sampler
    model = self._make_model(init_mean)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jen96/miniconda3/envs/nutpie_env/lib/python3.11/site-packages/nutpie/compile_pymc.py", line 195, in _make_model
    return _lib.PyMcModel(
           ^^^^^^^^^^^^^^^
RuntimeError: Coordinate unconstrained_parameter value has unsupported type

Caused by:
    RuntimeError: Could not convert to Value. Unsupported type: ArrowStringArray


cc @aseyboldt

Edit: I just hit this at work, it appears to be a pandas 3.0 issue. Downgrading pandas should solve it.

1 Like

As @jessegrabowski said, this is due to a change in pandas 3.0.
WIP fix is here: fix: compatibility with pandas 3.0 for string coords by aseyboldt · Pull Request #273 · pymc-devs/nutpie · GitHub
Until the fix is released, you can downgrade pandas, or convert coords with strings to numpy arrays manually.

Downgrading pandas solved the problem. Thanks!

Thanks to your team’s help, I’ve successfully implemented nutpie.
I have one final question: is there a way to access and print real-time status updates in a manner similar to the code below? I’m already running with “progress_bar=True” option.

Thanks

import pymc as pm
from pymc.progress_bar import ProgressBarManager

old_update = ProgressBarManager.update

def new_update(self, chain_idx, is_last, draw, tuning, stats): 
    print(chain_idx, draw)  # Do whatever you want with this info
    old_update(self, chain_idx, is_last, draw, tuning, stats)
    
ProgressBarManager.update = new_update

with pm.Model() as m:
    x = pm.Normal("x")
    pm.sample(tune=5, draws=5, chains=2)

That’s not currently possible, but it might make it into the next release.

The fix for pandas 3.0 is released by the way, it should work fine with 0.16.5

Oh, thank you for the notice! I hope one day it will be available.
Thanks