PyMC Compilation Nesting with pytensor.scan

I’m building a purely additive model, with no intercept value, where

Y = \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n

In this case Y represents the hours a person worked during the day, and each X_i is the count of tasks they completed for each type of task during the day. Here’s a synthetic dataset for this problem:

import copy
import numpy as np
import pandas as pd
import pymc as pm
import pytensor
import pytensor.tensor as pt
from typing import List, Tuple

np.random.seed(1234)
task_number = 300
record_number = 20 * task_number

# create random matrix of tasks completed for each record/day
df = pd.DataFrame()
df['time_taken'] = np.random.normal(loc=8, scale=1, size=record_number)  # Roughly 8 hour work day
tasks = np.arange(1, task_number + 1)
for i in tasks:
    df[f'task_{i}'] = 0
for i in df.index:
    # Randomly select 3 to 5 tasks completed for each record (same task can be completed multiple times per day)
    cpt_tasks = np.random.choice(tasks, np.random.randint(3, 6), replace=True)
    for ct in cpt_tasks:
        df.loc[i, f'task_{ct}'] += 1
df.head()

task_columns = [c for c in df.columns if 'task_' in c]
df[task_columns] = df[task_columns].astype(float)
task_length_guesses = {f'task_{i}': 2.0 for i in tasks}  # Naive starting guess of 2 hours per task
sigma_lam = 1
start = copy.deepcopy(task_length_guesses)
start['modelsigma'] = sigma_lam

If I set task_number in the code above to something like 50, the code below will work because the compilation nesting doesn’t get too deep.

def create_pymc_coef_prior(coef_mode: float, task: str) -> pm.Lognormal:
    assert coef_mode > 0, f'task {task} has a non-positive standard hours value: {coef_mode}'
    lognormal_std = coef_mode / 3
    coef_sigma = np.sqrt(np.log(1 + (lognormal_std / coef_mode) ** 2))
    assert coef_sigma > 0, f'task {task} has a non-positive sigma value: {coef_sigma}'
    coef_mu = np.log(coef_mode) + coef_sigma ** 2  # for lognormal \mu = \ln(\text{Mode}) + \sigma^2
    return pm.Lognormal(
        name=f"task_{task}",
        mu=coef_mu,
        sigma=coef_sigma,
        transform=None
    )


def create_pymc_coef_priors(task_columns: List[str], task_length_guesses: dict) -> List[pm.Lognormal]:
    coef_priors = list()
    for task_col in task_columns:
        task = task_col.split('_')[1]
        coef_mode = round(task_length_guesses[task_col], 2)
        coef_prior = create_pymc_coef_prior(coef_mode=coef_mode, task=task)
        coef_priors.append(coef_prior)
    return coef_priors

# Define the model
with pm.Model() as model:

    coef_priors = create_pymc_coef_priors(task_columns, task_length_guesses)
    expected_work_hours = sum([coef_prior * df[task_col].values for coef_prior, task_col
                               in zip(coef_priors, task_columns)])

    # Likelihood of observed WorkHours with lognormal distribution
    coef_sigma = pm.Exponential(name='modelsigma', lam=sigma_lam, transform=None)
    observed_workhours = pm.Normal(
        name='observed_workhours',
        mu=expected_work_hours,
        sigma=coef_sigma,
        observed=df['time_taken'].values,
    )
    trace = pm.sample(draws=2100, tune=1100, return_inferencedata=True, progressbar=True, cores=4,
                      init="adapt_diag", initvals=start)

However, I need the model to handler larger values for task_number, like 300, or 800. I’ve found some documentation here about how I should be using pytensor.scan to get around this issue. Here’s some referenc URL’s I’ve looked at:
https://pytensor.readthedocs.io/en/latest/library/scan.html

I did my best to try to follow these examples, but I’m definitely still getting something wrong. Here’s the current version of my pytensor.scan code. Can someone tell me what I’m doing wrong here?

def step(coef_mode, task_array, acc):
    # Create LogNormal prior for coefficient
    lognormal_std = coef_mode / 3  # Set the standard deviation of prior in lognormal space to 1/3 of the mode
    coef_sigma = np.sqrt(np.log(1 + (lognormal_std / coef_mode) ** 2))  # Convert lognormal std to normal std
    coef_mu = np.log(coef_mode) + coef_sigma ** 2  # for lognormal \mu = \ln(\text{Mode}) + \sigma^2
    coef_prior = pm.Lognormal(name='task_coeff', mu=coef_mu, sigma=coef_sigma, transform=None)

    return acc + coef_prior * task_array


coef_modes_tensor = pt.as_tensor_variable(list(task_length_guesses.values()))
operation_arrays_tensor = pt.stack([pt.as_tensor_variable(df[task_col].values) for task_col in task_columns])
initial_acc = pt.zeros_like(operation_arrays_tensor[0], dtype='float64')

with pm.Model() as model:

    # Use PyTensor's scan to compute expected workhours
    expected_work_hours, _ = pytensor.scan(
        fn=step,
        sequences=[coef_modes_tensor, operation_arrays_tensor],
        outputs_info=initial_acc
    )

    # Likelihood of observed WorkHours with lognormal distribution
    coef_sigma = pm.Exponential(name='modelsigma', lam=sigma_lam, transform=None)
    observed_workhours = pm.Normal(
        name='observed_workhours',
        mu=expected_work_hours[-1],  # We are interested in the last accumulated value
        sigma=coef_sigma,
        observed=df['time_taken'].values,
    )
    trace = pm.sample(draws=2100, tune=1100, return_inferencedata=True, progressbar=True, cores=4,
                      init="adapt_diag", initvals=start)

Can you write down the specific dependency structure between the priors you’re trying to achieve? From the code, it looks like you have something like \beta_{i} \sim \text{LogNormal}(\log(\beta_{i-1}), \sigma^2), is that your intent?

I ask because the code you use to generate data doesn’t appear to have this dependency between tasks.

To debug your specific code, I’d need to see the traceback of the error that you’re getting is. I can say that you need to use the .dist method to create random variables in the scan inner function, so the coef_prior line will raise an error.

Hi Jesse,
Thank you for your response. To clarify, my intention is not to create a dependency structure between the priors at all. All of them should be independent of each other. I did not intend anything like \beta_i \sim \text{LogNormal}(\log(\beta_{i-1}), \sigma^2) Any dependency that might exist in the synthetic data was not intentional. I was merely trying to quickly create a dataset that roughly had the correct columns and rows so the code could be run.

To create reasonable priors, I have assumed mode values (mode_i) for the coefficient prior \text{LogNormal} distributions. I’m also just arbitrarily setting the prior standard deviation in the \text{LogNormal} space to be equal to one third of the mode in the LogNormal space (\sigma_{lognormal, i} = mode_i / 3). Then, I need to transform those values into the Normal space, so I can have the parameters needed to initialize the \text{LogNormal} distribution properly:

\sigma_{normal,i} = \sqrt(\log(1 + (\sigma_{lognormal,i}/mode_i)^2))
\mu_{normal,i} = \log(mode_i) + \sigma_{normal,i}^2
\beta_i \sim \text{LogNormal}(\mu_{normal,i}, \sigma_{normal,i}^2)

Thanks for the advice about using the .dist method. I don’t know why I missed that in the other posts I referenced. When I update my step version of the code with that, it now looks like this:

def step(coef_mode, task_array, acc):
    # Create LogNormal prior for coefficient
    lognormal_std = coef_mode / 3  # Set the standard deviation of prior in lognormal space to 1/3 of the mode
    coef_sigma = np.sqrt(np.log(1 + (lognormal_std / coef_mode) ** 2))  # Convert lognormal std to normal std
    coef_mu = np.log(coef_mode) + coef_sigma ** 2  # for lognormal \mu = \ln(\text{Mode}) + \sigma^2
    coef_prior = pm.Lognormal.dist(mu=coef_mu, sigma=coef_sigma)

    return acc + coef_prior * task_array


coef_modes_tensor = pt.as_tensor_variable(list(task_length_guesses.values()))
operation_arrays_tensor = pt.stack([pt.as_tensor_variable(df[task_col].values) for task_col in task_columns])
initial_acc = pt.zeros_like(operation_arrays_tensor[0], dtype='float64')

with pm.Model() as model:

    # Use PyTensor's scan to compute expected workhours
    expected_work_hours, _ = pytensor.scan(
        fn=step,
        sequences=[coef_modes_tensor, operation_arrays_tensor],
        outputs_info=initial_acc
    )

    # Likelihood of observed WorkHours with lognormal distribution
    coef_sigma = pm.Exponential(name='modelsigma', lam=sigma_lam, transform=None)
    observed_workhours = pm.Normal(
        name='observed_workhours',
        mu=expected_work_hours[-1],  # We are interested in the last accumulated value
        sigma=coef_sigma,
        observed=df['time_taken'].values,
    )
    trace = pm.sample(draws=2100, tune=1100, return_inferencedata=True, progressbar=True, cores=4,
                      init="adapt_diag", initvals=start)

Here are the errors and tracebacks I get when I try to run that:

ERROR (pytensor.graph.rewriting.basic): Rewrite failure due to: constant_folding
ERROR (pytensor.graph.rewriting.basic): node: Join(0, [[0. 0. 0. ... 0. 0. 0.]], [[0. 0. 0. ... 0. 0. 0.]], ...
ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/graph/rewriting/basic.py", line 1922, in process_node
    replacements = node_rewriter.transform(fgraph, node)
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/graph/rewriting/basic.py", line 1081, in transform
    return self.fn(fgraph, node)
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/tensor/rewriting/basic.py", line 1110, in constant_folding
    thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/op.py", line 119, in make_thunk
    return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/op.py", line 84, in make_c_thunk
    outputs = cl.make_thunk(
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/basic.py", line 1189, in make_thunk
    cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/basic.py", line 1109, in __compile__
    thunk, module = self.cthunk_factory(
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/basic.py", line 1631, in cthunk_factory
    module = cache.module_from_key(key=key, lnk=self)
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/cmodule.py", line 1231, in module_from_key
    module = lnk.compile_cmodule(location)
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/basic.py", line 1532, in compile_cmodule
    module = c_compiler.compile_str(
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/cmodule.py", line 2639, in compile_str
    raise CompileError(
pytensor.link.c.exceptions.CompileError: Compilation failed (return status=1):
/usr/bin/clang++ -dynamiclib -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -undefined dynamic_lookup -I/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/numpy/core/include -I/opt/miniconda3/envs/myenv/include/python3.9 -I/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/c_code -L/opt/miniconda3/envs/myenv/lib -fvisibility=hidden -o /Users/myuser/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.9.19-64/tmpmfcfbpqd/m5827e5447ad45b3501581ba1dab81f701bd42767840fb02f1414980322660bb2.so /Users/myuser/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.9.19-64/tmpmfcfbpqd/mod.cpp
/Users/myuser/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.9.19-64/tmpmfcfbpqd/mod.cpp:5712:5: fatal error: bracket nesting level exceeded maximum of 256
    {Py_XINCREF(py_V507);}
    ^
/Users/myuser/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.9.19-64/tmpmfcfbpqd/mod.cpp:5712:5: note: use -fbracket-depth=N to increase maximum nesting level
1 error generated.
Traceback (most recent call last):
  File "/Users/myuser/dev/myenv/src/models/pymc_scan.py", line 174, in <module>
    trace = pm.sample(draws=2100, tune=1100, return_inferencedata=True, progressbar=True, cores=4,
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pymc/sampling/mcmc.py", line 679, in sample
    step = assign_step_methods(model, step, methods=pm.STEP_METHODS, step_kwargs=kwargs)
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pymc/sampling/mcmc.py", line 210, in assign_step_methods
    model_logp = model.logp()
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pymc/model/core.py", line 717, in logp
    rv_logps = transformed_conditional_logp(
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pymc/logprob/basic.py", line 631, in transformed_conditional_logp
    raise ValueError(RVS_IN_JOINT_LOGP_GRAPH_MSG % rvs_in_logp_expressions)
ValueError: Random variables detected in the logp graph: {lognormal_rv{0, (0, 0), floatX, False}.out}.
This can happen when DensityDist logp or Interval transform functions reference nonlocal variables,
or when not all rvs have a corresponding value variable.

You need to wrap that Scan inside a CustomDist, in order to define a PyMC model variable. This PyMC example may be useful: Time Series Models Derived From a Generative Graph — PyMC example gallery

@ricardoV94,
That recommended article was useful. I wrapped my " Scan inside a CustomDist" as you recommended. Here’s the updated version of my code:

from pymc.pytensorf import collect_default_updates


def step(coef_mode, task_array, acc):
    # Create LogNormal prior for coefficient
    lognormal_std = coef_mode / 3  # Set the standard deviation of prior in lognormal space to 1/3 of the mode
    coef_sigma = np.sqrt(np.log(1 + (lognormal_std / coef_mode) ** 2))  # Convert lognormal std to normal std
    coef_mu = np.log(coef_mode) + coef_sigma ** 2  # for lognormal \mu = \ln(\text{Mode}) + \sigma^2
    coef_prior = pm.Lognormal.dist(mu=coef_mu, sigma=coef_sigma)

    new_acc = acc + coef_prior * task_array
    updates = collect_default_updates(inputs=[acc], outputs=[coef_prior, new_acc])

    return new_acc, updates


coef_modes_tensor = pt.as_tensor_variable(list(task_length_guesses.values()))
operation_arrays_tensor = pt.stack([pt.as_tensor_variable(df[task_col].values) for task_col in task_columns])
initial_acc = pt.zeros_like(operation_arrays_tensor[0], dtype='float64')


def custom_dist_func(coef_modes, task_arrays):
    expected_work_hours, _ = pytensor.scan(
        fn=step,
        sequences=[coef_modes, task_arrays],
        outputs_info=initial_acc
    )
    return expected_work_hours[-1]  # Return the final value


# Define a custom distribution using pm.CustomDist
with pm.Model() as model:
    
    custom_dist = pm.CustomDist(
        'custom_workhours',
        coef_modes_tensor,
        operation_arrays_tensor,
        dist=lambda coef_modes, task_arrays, size=None: custom_dist_func(coef_modes, task_arrays)
    )
    coef_sigma = pm.Exponential(name='modelsigma', lam=sigma_lam, transform=None)
    observed_workhours = pm.Normal(
        name='observed_workhours',
        mu=custom_dist,
        sigma=coef_sigma,
        observed=df['time_taken'].values,
    )

    trace = pm.sample(draws=2100, tune=1100, return_inferencedata=True, progressbar=True, cores=4,
                      init="adapt_diag", initvals=start)

Even after this change I’m getting the following error:

ERROR (pytensor.graph.rewriting.basic): TRACEBACK:
ERROR (pytensor.graph.rewriting.basic): Traceback (most recent call last):
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/graph/rewriting/basic.py", line 1922, in process_node
    replacements = node_rewriter.transform(fgraph, node)
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/graph/rewriting/basic.py", line 1081, in transform
    return self.fn(fgraph, node)
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/tensor/rewriting/basic.py", line 1110, in constant_folding
    thunk = node.op.make_thunk(node, storage_map, compute_map, no_recycling=[])
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/op.py", line 119, in make_thunk
    return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/op.py", line 84, in make_c_thunk
    outputs = cl.make_thunk(
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/basic.py", line 1189, in make_thunk
    cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/basic.py", line 1109, in __compile__
    thunk, module = self.cthunk_factory(
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/basic.py", line 1631, in cthunk_factory
    module = cache.module_from_key(key=key, lnk=self)
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/cmodule.py", line 1231, in module_from_key
    module = lnk.compile_cmodule(location)
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/basic.py", line 1532, in compile_cmodule
    module = c_compiler.compile_str(
  File "/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/cmodule.py", line 2639, in compile_str
    raise CompileError(
pytensor.link.c.exceptions.CompileError: Compilation failed (return status=1):
/usr/bin/clang++ -dynamiclib -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -Wno-c++11-narrowing -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -undefined dynamic_lookup -I/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/numpy/core/include -I/opt/miniconda3/envs/myenv/include/python3.9 -I/opt/miniconda3/envs/myenv/lib/python3.9/site-packages/pytensor/link/c/c_code -L/opt/miniconda3/envs/myenv/lib -fvisibility=hidden -o /Users/myuser/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.9.19-64/tmpmfcfbpqd/m5827e5447ad45b3501581ba1dab81f701bd42767840fb02f1414980322660bb2.so /Users/myuser/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.9.19-64/tmpmfcfbpqd/mod.cpp
/Users/myuser/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.9.19-64/tmpmfcfbpqd/mod.cpp:5712:5: fatal error: bracket nesting level exceeded maximum of 256
    {Py_XINCREF(py_V507);}
    ^
/Users/myuser/.pytensor/compiledir_macOS-14.6.1-x86_64-i386-64bit-i386-3.9.19-64/tmpmfcfbpqd/mod.cpp:5712:5: note: use -fbracket-depth=N to increase maximum nesting level
1 error generated.

Can you see if the problem arises in google Colab as well? The bracket nesting is one that crops now and then but I never had a chance to debug/replicate

1 Like

The recursion relationship is in your scan function, not in the synthetic data you generated. At each step, coef_mode[i] = acc[i-1] + coef_prior[i-1] * task_array[i-1] is the output of the previous step, and coef_mu is a function of coef_mode.

I don’t see coef_mode[i] = acc[i-1] + coef_prior[i-1] * task_array[i-1] anywhere in the code I’ve provided. What I provided was return acc + coef_prior * task_array. My understanding, which may be flawed, was that if I supplied a zero initialized tensor (initial_acc), then I could add up the effects from each coefficient (\beta_i) and task array (X_i) one step at a time using pytensor.scan and the defined step function. I’m assuming that when I return acc + coef_prior * task_array, that value gets passed into the next step call as the acc variable. If that’s the case, then the pytensor.scan should be accumulating the independent effects from each \beta_iX_i combination one step at a time. Is there a flaw in this logic?

@ricardoV94, that was a great idea! It appears that even the non pytensor.scan version of the code works in Google Colab. It appears to be an issue with my Mac clang++ compiler. I’m not sure what OS Colab runs on, or what compiler they use, but it seems to get past the compilation nesting issue.

I guess clang has a lower limit on bracket nesting than gcc which Colab / linux uses by default. Can you bump it with some flag?

I’ve tried bumping it with the -fbracket-dept=N flag several different ways, but for some reason it keeps ignoring the flag during the clang++ call no matter how I try to update the flag. I’ve tried altering that flag in my OS environment variables, I’ve tried to do it in code like os.environ['CXXFLAGS'] = "-fbracket-depth=2048" and os.environ['CFLAGS'] = "-fbracket-depth=2048", I’ve tried to do it by modifying flags in pytensor like

    pytensor.config.cxxflags = "-fbracket-depth=2048"
    pytensor.config.cflags = "-fbracket-depth=2048"
    pytensor.config.gcc__cxxflags = "-fbracket-depth=2048"

Regardless, when clang++ get’s called during the model compilation, it just ignores the flag and then crashes due to nesting.

If I try to switch to gcc explicitly, I get this warning:
WARNING (pytensor.configdefaults): Only clang++ is supported. With g++, we end up with strange g++/OSX bugs.

So, it seems that it’s only an issue for Mac’s because we have to use clang++, but there’s a compilation nesting limit.

Perhaps @lucianopaz or @aseyboldt have an idea how that flag should be propagated or if it’s a bug in PyTensor?

Why are you trying to define the priors and the linear expectation in “one step”/incrementally? If you are worried about performance that’s misguided.

Defining a vector of beta priors (with shape=len(X)) and then taking the dot product with X to obtain y, will be much more efficient than a Scan based approach, and likely avoid and bracket nesting issues you’re seeing.

PyMC doesn’t run code in an imperative manner like that. The priors will always be defined as a vector (even with a Scan CustomDist) and then operated deterministically to calculate the logp. With Scan you have two disadvantages: you lose out on vectorization and your priors become interdependent, both of which will make things slower/harder to sample.

If you are concerned about memory allocations, I don’t think Scan will be better either.

Scan is only really needed for timeseries/ recurrent structural dependencies.

1 Like