Declare priors using loop results in bracket nesting level exceeded maximum


I am new to pymc and I am currently wanting to use the package to infer parameters. I want to construct the model dynamically, i.e. the number of RVs depends on the number of particles in the given data set. But for datasets with many particles, the for loop results in the following error:

fatal error: bracket nesting level exceeded maximum of 256

A minimal example to reproduce the problem would be the following.

Note that for the sake of simplicity, all particles are assumed to be completely independent of each other, and therefore the code looks a bit strange / or you might think I am a bit stupid. However, in my case of interest, the initial value (and therefore mu) of each particle depends on the final value of another particle.

import numpy as np
import pymc as pm

# assume that all particles are observed from x=0 to x=1
x = np.linspace(0, 1)

# number of particles
# Note, N=100 is large enough to generate the fatal error, whereas for example 
# N=50 would still work fine for me
keys = [f'{i}' for i in range(N)]

# generate observed data
y_observed = {}
for key in keys:
    # for simplicity, assume that all particles follow the same exponential 
    # growth with normal noise and have the same true growth rate mu = 0.8 and 
    # sigma = 0.1
    y_observed[key] = np.random.normal(loc=np.exp(0.8 * x), 

with pm.Model() as model:
    mu, y = {}, {}
    # assume that sigma should be shared among all particles
    sigma = pm.Normal('sigma')

    #  add a rv (mu) and the likelihood of the observation (y) for each particle
    for key in keys:
        mu[key] = pm.Normal(f'mu_{key}', mu=1)
        y[key] = pm.Normal(f'y_{key}', mu=np.exp(mu[key] * x), sigma=sigma, observed=y_observed[key])

I am grateful for any suggestions on how I could create such a large dynamic model :slight_smile:

PS: So far, by using the PYTENSOR_FLAGS='optimizer=fast_compile', I have not been able to reproduce the bug, it has worked fine.

PPS: Could I use scan for this? In case the solution is already given in Declaring Priors using Loops for State Space Models, I am sorry for not being able to transfer it.

As you note, the particles are independent of each other in the current example so it’s hard to give advice. Can you post a simple version of the model you’re actually interested in, even if it’s in numpy code? Scan would be the most general solution, but you might also be able to get away with something more clever depending on the dependency structure between the particles.

1 Like

@jessegrabowski: Thank you for your prompt reply :slight_smile:

I am interested in a branching process. It starts with a single particle that first accumulates y and then divides into two new particles. After the division, the y of the parent is divided equally between the two daughter particles.

A minimal model following this logic is given below:

import numpy as np
import pymc as pm

def get_keys(N_generation):
    """ Generate a binary Galton-Watson branching process (BP).

    The process starts with one particle 'R', upon dying it gives birth to two
    new particles 'R0' and 'R1', each behaving the same is its mother...

    keys = []
    waiting_list = ['R']
    while waiting_list:
        key = waiting_list.pop(0)
        for daughter in ['0', '1']:
            if len(key) < N_generation:
                waiting_list.append(key + daughter)
    return keys

def get_lifetime(key):
    """ Return lifetime for given particle (key).

    Assume that each particles lifetime is 1. So the first particle 'R' is
    observed from t=0 to t=1. It then give birth to 'R0' and 'R1', both living
    one time step, i.e. t=1 to t=2. And so on...

    return np.linspace(len(key) - 1, len(key), 100)

def model(t, mu, y_0):
    """ Exponential growth model."""
    t_0 = np.min(t)
    return y_0 * np.exp(mu * (t - t_0))

def get_y_0(key, mu, y_0_R):
    """ Return y_0 for the likelihood.

    Note, as the initial condition is inherited, it depends on all growth rate
    of its ancestors and the intial condition 'y_0_R' of 'R'.

    if key == 'R':
        y_0 = y_0_R
        mother = key[:-1]
        # y of mother at its last timepoint is splitted equally between
        # both daughters
        y_0 = 0.5 * model(t=get_lifetime(key=mother),
    return y_0

# get keys for BP with 7 generations
# Note, starting from 7 gen and higher, the code generates the error
keys = get_keys(N_generation=7)

# generate observed data
y_observed = {}
for key in keys:
    t = get_lifetime(key)

    # the initial value of the particle is inherited by the mother particle
    # in case of the mother assume y0 = 1
    if key == 'R':
        y_0 = 1
        mother = key[:-1]
        # Note, as keys are sorted, the mother entry always exists
        y_0 = 0.5 * y_observed[mother][-1]

    # for simplicity, assume that all particles follow the same exponential
    # growth model with normal noise and have the same true growth rate
    # mu = log(2) and sigma = 0.2
    y_observed[key] = np.random.normal(loc=model(t=t,

with pm.Model() as pymc_model:
    mu, y, y_data = {}, {}, {}
    # assume that sigma should be shared among all particles
    sigma = pm.Normal('sigma')
    y_0_R = pm.Uniform('y_0_R', 0, 2)

    #  add a rv (mu) for each particle
    for key in keys:
        mu[key] = pm.Uniform(f'mu_{key}', 0, 2)
    #  add the likelihood of the observation (y) for each particle
    for key in keys:
        # get the initial condition for the given particle
        y_0 = get_y_0(key, mu, y_0_R)

        # Expected value of outcome
        y_expected = model(t=get_lifetime(key),
        # Here, we create a 'pm.ConstantData' containers to hold the
        # observations, as thisa container is passed to the 'observed'
        y_data[key] = pm.ConstantData('y_data_' + key, y_observed[key])

        y[key] = pm.Normal(f'y_{key}',
    pymc_trace = pm.sample()

My first reaction is that this looks like a type of message passing algorithm? I think the family tree of particles could be viewed as a directed graph like this:

import networkx as nx
from networkx.drawing.nx_agraph import graphviz_layout
import matplotlib.pyplot as plt

n_generations = 7
edges = [('R', 'R')]
for gen in range(n_generations):
    gen_slice = slice((2**gen -1), 2 ** (gen + 1))
    parents = [x[1] for x in edges[gen_slice]]
    for parent in parents:
        edges.extend([(parent, f'{parent}{i}') for i in [0, 1]])

G = nx.DiGraph(edges[1:])
pos = graphviz_layout(G, prog='dot', root='R')

fig, ax = plt.subplots(figsize=(14, 4), dpi=144)
nx.draw_networkx(G, pos=pos, ax=ax)

This is handy because then you can use the adjacency matrix of this graph to do the iterative updating. Starting from a y0 vector of shape 2 ** (n_generations + 1) - 1 with a seed value in position 0, you can use matrix multiplication with the adjacency matrix to iteratively pass the final values of the particles forward down the network. This isn’t correct, but I’m thinking about something like:

A = nx.adjacency_matrix(G).todense().T
y0_vec = np.zeros(A.shape[0])
mu = np.log(2)
y0_R = 1.0
y0_vec[0] = y0_R * np.exp(mu + 1)

for gen in range(1, n_generations + 1):
    y0_vec = np.linalg.matrix_power(A, gen) @ y0_vec * np.exp(mu * gen)

At any rate the purpose of trying to represent it this way is that this computation is easy to scan over. If I’m off the mark, the take-away should just be that you want to get out of these recursive computations over dictionaries and think about data structures you can represent on a computation graph, like arrays, and implement your model that way.


Thanks for that great idea. And yes, indeed, writing the problem as a directed graph is a nice way to vectorize it. I think the following code should describe my example:

A = nx.adjacency_matrix(G).todense().T
y0_vec = np.zeros(A.shape[0])
mu = np.log(2) + np.random.normal(0,0.1,A.shape[0])
y0_R = 1.0
y0_vec[0] = y0_R * np.exp(mu[0])

res = y0_vec

for gen in range(1, n_generations + 1):
   y0_vec = A @ (np.diag(y0_vec) @ np.exp(mu) / 2)
   res += y0_vec

However, I have to admit that it is still unclear to me on how to use this with the scan method.

You need to sort your variable into three groups:

  • sequences, stuff you want to iterate over
  • outputs_info, stuff you want to compute on then recursively feed back into the loop at every iteration
  • non_sequences, stuff you want to carry along into every loop iteration

I think you have no sequences, res and y0_vec will be outputs_info, and mu and A will be non_sequences.

After sorting, you write the inner function. You have to sort first because the inputs to the inner function have to be in a certain order: sequences, then outputs_info, then non_sequences. So if my classification is right, you’ll have an inner function like this:

def step(last_y0, last_res, A, mu):
    new_y0 = A @ (pt.diag(last_y0) @ pt.exp(mu) / 2)
    new_res = last_res + new_y0
    return new_y0, new_res

Then you give this all to scan:

(y0_sequence, res_sequence), updates = pytensor.scan(step, 
                                                     outputs_info=[y0_vec, y0_vec], 
                                                     non_sequences=[A, mu],

Warning: I didn’t test any of this code.

Also, all of this is better explained in the scan docs, found here.

Thanks again for your great help.

For the sake of completeness, here is the working code.
As scan saves all sequences, new_res is not needed

def step(last_y0, A, mu):
    new_y0 = A @ (last_y0 * pt.exp(mu) / 2)
    return new_y0
y0_vec = pm.math.concatenate([[y0_R], pt.zeros(dim - 1)], axis=0) 
y0_sequence, updates = pytensor.scan(step,
                                     non_sequences=[A, mu],
# We only care about the merged vector
y0 = y0_sequence.sum(axis=0) + y0_vec
1 Like