Declare priors using loop results in bracket nesting level exceeded maximum

gwosd · February 20, 2024, 4:31pm

Hello,

I am new to pymc and I am currently wanting to use the package to infer parameters. I want to construct the model dynamically, i.e. the number of RVs depends on the number of particles in the given data set. But for datasets with many particles, the for loop results in the following error:

fatal error: bracket nesting level exceeded maximum of 256

A minimal example to reproduce the problem would be the following.

Note that for the sake of simplicity, all particles are assumed to be completely independent of each other, and therefore the code looks a bit strange / or you might think I am a bit stupid. However, in my case of interest, the initial value (and therefore mu) of each particle depends on the final value of another particle.

import numpy as np
import pymc as pm


# assume that all particles are observed from x=0 to x=1
x = np.linspace(0, 1)

# number of particles
# Note, N=100 is large enough to generate the fatal error, whereas for example 
# N=50 would still work fine for me
keys = [f'{i}' for i in range(N)]

# generate observed data
y_observed = {}
for key in keys:
    # for simplicity, assume that all particles follow the same exponential 
    # growth with normal noise and have the same true growth rate mu = 0.8 and 
    # sigma = 0.1
    y_observed[key] = np.random.normal(loc=np.exp(0.8 * x), 
                                       scale=0.1)

with pm.Model() as model:
    mu, y = {}, {}
    # assume that sigma should be shared among all particles
    sigma = pm.Normal('sigma')

    #  add a rv (mu) and the likelihood of the observation (y) for each particle
    for key in keys:
        mu[key] = pm.Normal(f'mu_{key}', mu=1)
        y[key] = pm.Normal(f'y_{key}', mu=np.exp(mu[key] * x), sigma=sigma, observed=y_observed[key])
    pm.sample()

I am grateful for any suggestions on how I could create such a large dynamic model

PS: So far, by using the PYTENSOR_FLAGS='optimizer=fast_compile', I have not been able to reproduce the bug, it has worked fine.

PPS: Could I use scan for this? In case the solution is already given in Declaring Priors using Loops for State Space Models, I am sorry for not being able to transfer it.

jessegrabowski · February 20, 2024, 4:49pm

As you note, the particles are independent of each other in the current example so it’s hard to give advice. Can you post a simple version of the model you’re actually interested in, even if it’s in numpy code? Scan would be the most general solution, but you might also be able to get away with something more clever depending on the dependency structure between the particles.

gwosd · February 21, 2024, 10:21am

@jessegrabowski: Thank you for your prompt reply

I am interested in a branching process. It starts with a single particle that first accumulates y and then divides into two new particles. After the division, the y of the parent is divided equally between the two daughter particles.

A minimal model following this logic is given below:

import numpy as np
import pymc as pm


def get_keys(N_generation):
    """ Generate a binary Galton-Watson branching process (BP).

    The process starts with one particle 'R', upon dying it gives birth to two
    new particles 'R0' and 'R1', each behaving the same is its mother...

    """
    keys = []
    waiting_list = ['R']
    while waiting_list:
        key = waiting_list.pop(0)
        keys.append(key)
        for daughter in ['0', '1']:
            if len(key) < N_generation:
                waiting_list.append(key + daughter)
    return keys

def get_lifetime(key):
    """ Return lifetime for given particle (key).

    Assume that each particles lifetime is 1. So the first particle 'R' is
    observed from t=0 to t=1. It then give birth to 'R0' and 'R1', both living
    one time step, i.e. t=1 to t=2. And so on...

    """
    return np.linspace(len(key) - 1, len(key), 100)

def model(t, mu, y_0):
    """ Exponential growth model."""
    t_0 = np.min(t)
    return y_0 * np.exp(mu * (t - t_0))

def get_y_0(key, mu, y_0_R):
    """ Return y_0 for the likelihood.

    Note, as the initial condition is inherited, it depends on all growth rate
    of its ancestors and the intial condition 'y_0_R' of 'R'.

    """
    if key == 'R':
        y_0 = y_0_R
    else:
        mother = key[:-1]
        # y of mother at its last timepoint is splitted equally between
        # both daughters
        y_0 = 0.5 * model(t=get_lifetime(key=mother),
                          mu=mu[mother],
                          y_0=get_y_0(key=mother,
                                      mu=mu,
                                      y_0_R=y_0_R))[-1]
    return y_0

# get keys for BP with 7 generations
# Note, starting from 7 gen and higher, the code generates the error
keys = get_keys(N_generation=7)

# generate observed data
y_observed = {}
for key in keys:
    t = get_lifetime(key)

    # the initial value of the particle is inherited by the mother particle
    # in case of the mother assume y0 = 1
    if key == 'R':
        y_0 = 1
    else:
        mother = key[:-1]
        # Note, as keys are sorted, the mother entry always exists
        y_0 = 0.5 * y_observed[mother][-1]

    # for simplicity, assume that all particles follow the same exponential
    # growth model with normal noise and have the same true growth rate
    # mu = log(2) and sigma = 0.2
    y_observed[key] = np.random.normal(loc=model(t=t,
                                                 mu=np.log(2),
                                                 y_0=y_0),
                                       scale=0.2)


with pm.Model() as pymc_model:
    mu, y, y_data = {}, {}, {}
    # assume that sigma should be shared among all particles
    sigma = pm.Normal('sigma')
    y_0_R = pm.Uniform('y_0_R', 0, 2)

    #  add a rv (mu) for each particle
    for key in keys:
        mu[key] = pm.Uniform(f'mu_{key}', 0, 2)
    #  add the likelihood of the observation (y) for each particle
    for key in keys:
        # get the initial condition for the given particle
        y_0 = get_y_0(key, mu, y_0_R)

        # Expected value of outcome
        y_expected = model(t=get_lifetime(key),
                           mu=mu[key],
                           y_0=y_0)
        # Here, we create a 'pm.ConstantData' containers to hold the
        # observations, as thisa container is passed to the 'observed'
        y_data[key] = pm.ConstantData('y_data_' + key, y_observed[key])

        y[key] = pm.Normal(f'y_{key}',
                           mu=y_expected,
                           sigma=sigma,
                           observed=y_data[key])
    pymc_trace = pm.sample()

jessegrabowski · February 21, 2024, 2:55pm

My first reaction is that this looks like a type of message passing algorithm? I think the family tree of particles could be viewed as a directed graph like this:

import networkx as nx
from networkx.drawing.nx_agraph import graphviz_layout
import matplotlib.pyplot as plt

n_generations = 7
edges = [('R', 'R')]
for gen in range(n_generations):
    gen_slice = slice((2**gen -1), 2 ** (gen + 1))
    parents = [x[1] for x in edges[gen_slice]]
    for parent in parents:
        edges.extend([(parent, f'{parent}{i}') for i in [0, 1]])


G = nx.DiGraph(edges[1:])
pos = graphviz_layout(G, prog='dot', root='R')

fig, ax = plt.subplots(figsize=(14, 4), dpi=144)
nx.draw_networkx(G, pos=pos, ax=ax)

This is handy because then you can use the adjacency matrix of this graph to do the iterative updating. Starting from a y0 vector of shape 2 ** (n_generations + 1) - 1 with a seed value in position 0, you can use matrix multiplication with the adjacency matrix to iteratively pass the final values of the particles forward down the network. This isn’t correct, but I’m thinking about something like:

A = nx.adjacency_matrix(G).todense().T
y0_vec = np.zeros(A.shape[0])
mu = np.log(2)
y0_R = 1.0
y0_vec[0] = y0_R * np.exp(mu + 1)

for gen in range(1, n_generations + 1):
    y0_vec = np.linalg.matrix_power(A, gen) @ y0_vec * np.exp(mu * gen)

At any rate the purpose of trying to represent it this way is that this computation is easy to scan over. If I’m off the mark, the take-away should just be that you want to get out of these recursive computations over dictionaries and think about data structures you can represent on a computation graph, like arrays, and implement your model that way.

gwosd · February 22, 2024, 11:03am

Thanks for that great idea. And yes, indeed, writing the problem as a directed graph is a nice way to vectorize it. I think the following code should describe my example:

A = nx.adjacency_matrix(G).todense().T
y0_vec = np.zeros(A.shape[0])
mu = np.log(2) + np.random.normal(0,0.1,A.shape[0])
y0_R = 1.0
y0_vec[0] = y0_R * np.exp(mu[0])

res = y0_vec

for gen in range(1, n_generations + 1):
   y0_vec = A @ (np.diag(y0_vec) @ np.exp(mu) / 2)
   res += y0_vec

However, I have to admit that it is still unclear to me on how to use this with the scan method.

jessegrabowski · February 22, 2024, 11:13am

You need to sort your variable into three groups:

sequences, stuff you want to iterate over
outputs_info, stuff you want to compute on then recursively feed back into the loop at every iteration
non_sequences, stuff you want to carry along into every loop iteration

I think you have no sequences, res and y0_vec will be outputs_info, and mu and A will be non_sequences.

After sorting, you write the inner function. You have to sort first because the inputs to the inner function have to be in a certain order: sequences, then outputs_info, then non_sequences. So if my classification is right, you’ll have an inner function like this:

def step(last_y0, last_res, A, mu):
    new_y0 = A @ (pt.diag(last_y0) @ pt.exp(mu) / 2)
    new_res = last_res + new_y0
    return new_y0, new_res

Then you give this all to scan:

(y0_sequence, res_sequence), updates = pytensor.scan(step, 
                                                     outputs_info=[y0_vec, y0_vec], 
                                                     non_sequences=[A, mu],
                                                     n_steps=n_generations)

Warning: I didn’t test any of this code.

Also, all of this is better explained in the scan docs, found here.

gwosd · March 1, 2024, 8:57am

Thanks again for your great help.

For the sake of completeness, here is the working code.
As scan saves all sequences, new_res is not needed

def step(last_y0, A, mu):
    new_y0 = A @ (last_y0 * pt.exp(mu) / 2)
    return new_y0

y0_vec = pm.math.concatenate([[y0_R], pt.zeros(dim - 1)], axis=0) 
y0_sequence, updates = pytensor.scan(step,
                                     outputs_info=y0_vec,
                                     non_sequences=[A, mu],
                                     n_steps=N_GEN)
# We only care about the merged vector
y0 = y0_sequence.sum(axis=0) + y0_vec

Velochy · November 25, 2024, 9:14am

In case anyone else has the same error and stumbles on the same cryptic error message:

The error is caused by the model being “too complex”. Basically, the generated C code for it having too many nested brackets. A for loop, like in this thread, is one easy way of doing so but is by no means the only one.

This is an issue only on Macs, because of the C compiler there being different (clang vs gcc)

This being said, there is an actual solution that does not require rewriting your model: the bracket limit can be increased with a compiler flag that is given to pytensor. Effectively, the only thing needed is adding the following before importing pymc:

import pytensor
pytensor.config.gcc__cxxflags = '-fbracket-depth=512'

Topic		Replies	Views
PyMC Compilation Nesting with pytensor.scan	12	107	September 7, 2024
Best Way to Handle a for Loop in pymc4 v5 development , theano , modeling	9	2035	August 22, 2022
Theano bracket nesting level error v3	5	682	August 27, 2022
Behrens' Bayesian Learner Model : How to share parameters between steps? Questions	44	2257	January 7, 2022
Markov switching dynamic autoregression Questions	15	2423	March 8, 2020

Declare priors using loop results in bracket nesting level exceeded maximum

Related topics