Very short time step and memory error

Hi all,

Very short time step is required for time evolution, but the observable time is in seconds. The following code will cause a memory error. Is there any way to resolve this error?
I would be thankful if you could help me with this question.

import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import arviz as az
from sklearn.datasets import make_regression

import pymc as pm
import pytensor
import pytensor.tensor as pt
from pytensor.scan.utils import until
from pymc.pytensorf import collect_default_updates

# Initialize random number generator
RANDOM_SEED = 8927
rng = np.random.default_rng(RANDOM_SEED)
az.style.use("arviz-darkgrid")

time = 600
steps_per_second = 10000000
n_steps = time * steps_per_second
dt =1 / steps_per_second


def ode_update_func(A, B, C, k1, k2, am, bm):
    A_new = A + (-k1*pt.power(A, am))*dt
    B_new = B + (k1*pt.power(A, am) - k2*pt.power(B, bm))*dt
    C_new = C + (k2*pt.power(B, bm))*dt
    
    return A_new, B_new, C_new


with pm.Model() as model:
    k1 = pm.TruncatedNormal("k1", mu=0.3, sigma=0.2, lower=0)
    k2 = pm.TruncatedNormal("k2", mu=0.03, sigma=0.02, lower=0)
    
    am = pm.TruncatedNormal("am", mu=1.0, sigma=0.2, lower=0)
    bm = pm.TruncatedNormal("bm", mu=1.0, sigma=0.2, lower=0)
    
    sigma = pm.InverseGamma("sigma", alpha=5, beta=0.2)
    
    A_init = np.array(1.0, dtype=np.float64)
    B_init = np.array(0.0, dtype=np.float64)
    C_init = np.array(0.0, dtype=np.float64)
    
    results, updates = pytensor.scan(
        fn=ode_update_func,
        outputs_info=[A_init, B_init, C_init],
        non_sequences=[k1, k2, am, bm],
        n_steps=n_steps,
        strict=True
        )
    
    pm.Deterministic("result", pm.math.stack([results[0][::steps_per_second], results[1][::steps_per_second], results[2][::steps_per_second]]))

   idata = pm.sample(draws=10, tune=10, chains=1, nuts_sampler="numpyro") 
Only 10 samples in chain.
Compiling...
Compilation time = 0:00:00.358350
Sampling...
sample: 100%|██████████| 20/20 [00:00<00:00, 21.76it/s, 15 steps of size 9.55e-02. acc. prob=0.98]
Sampling time = 0:00:01.670323
Transforming variables...

If the kernel crashes during the Transforming variables step, it’s because model deterministics are too large to be stored in memory. In your case I imagine it’s an intermediate computation on results, since they will have shape (chains, draws, n_steps) of float64 memory, which I guess in your case is (1, 10, 6_000_000_000), which should require about 480 gigabytes to store.

1 Like

Thank you for your quick reply!
Is there any way to thin out the results?
Is there any solution other than increasing memory?

If you only wanted the final states pytensor does some smart memory optimization. Since you want a sampling of intermediate states you would have to figure out something custom with a working memory.

I’m slightly baffled that you are asking for 6 billion timesteps. Before embarking on a quest of detailed memory optimization, I’d carefully reflect on whether that’s necessary.

What application do you have that requires a sequence of 6 billion numbers?

@jessegrabowski @ricardoV94
When simulating the time evolution of a chemical reaction, the time step should be set from 10^-6(μs) to 10^-9(ns). However, observational data are measured at intervals of about one point per minute.

Therefore, only 10 timesteps(600s/60s/min), not 6 billion timesteps, is actually needed in the likelihood calculation.

results[0][::steps_per_second]

Since you want a sampling of intermediate states you would have to figure out something custom with a working memory.

Could you please tell me how to implement this?

I honestly have no idea, you’d have to dig deep into pytensor scan and figuring out how memory allocation works.

Since your model only has a few parameters, I’d honestly suggest you do it in numpy. You can pre-allocate an array of size time, then use the step % steps_per_second-th row as working memory, overwriting the intermediate values at every step, and only saving the final one. Wrap this custom function into a pytensor Op.

You would lose gradients, but as long as the problem is low dimensional something like slice or SMC should do a fine job sampling it. You can even use numba to jit the inner loop and get a big speedup.

Thank you for your thoughtful response.
I will look deeper into Scan, but following the code may be difficult for me.
And unfortunately, the actual problem I want to solve is a more complex problem with more parameters. However, I will challenge the following methods.

Since your model only has a few parameters, I’d honestly suggest you do it in numpy. You can pre-allocate an array of size time , then use the step % steps_per_second -th row as working memory, overwriting the intermediate values at every step, and only saving the final one. Wrap this custom function into a pytensor Op .

I have created the following code.
Is there anything inappropriate in the Op code?

from pytensor.compile.ops import as_op
from numba import njit

def ode_update_func(A, B, C, k1, k2, am, bm):
    A_new = A + (-k1*np.power(A, am))*dt
    B_new = B + (k1*np.power(A, am) - k2*np.power(B, bm))*dt
    C_new = C + (k2*np.power(B, bm))*dt
    
    A_new = np.clip(A_new, 0, np.inf)
    B_new = np.clip(B_new, 0, np.inf)
    C_new = np.clip(C_new, 0, np.inf)
    
    return A_new, B_new, C_new

@as_op(itypes=[pt.dscalar, pt.dscalar, pt.dscalar, pt.dscalar, pt.dscalar, pt.dscalar, pt.dscalar], otypes=[pt.dmatrix])
def ode_numpy(A_init, B_init, C_init, k1, k2, am, bm):
    y = np.zeros((time, 3))
    y[0, 0] = A_init
    y[0, 1] = B_init
    y[0, 2] = C_init
    idx_pre = 0
    for i_step in range(n_steps):
        idx = i_step // steps_per_second
        y[idx, 0], y[idx, 1], y[idx, 2] = ode_update_func(y[idx_pre, 0], y[idx_pre, 1], y[idx_pre, 2], k1, k2, am, bm) 
        idx_pre = idx
        
    return y

with pm.Model() as model:
    k1 = pm.TruncatedNormal("k1", mu=0.3, sigma=0.2, lower=0)
    k2 = pm.TruncatedNormal("k2", mu=0.03, sigma=0.02, lower=0)
    
    am = pm.TruncatedNormal("am", mu=1.0, sigma=0.2, lower=0)
    bm = pm.TruncatedNormal("bm", mu=1.0, sigma=0.2, lower=0)
    sigma = pm.InverseGamma("sigma", alpha=1.0, beta=1.0)
    
    A_init = pt.as_tensor_variable(np.array(1.0, dtype=np.float64))
    B_init = pt.as_tensor_variable(np.array(0.0, dtype=np.float64))
    C_init = pt.as_tensor_variable(np.array(0.0, dtype=np.float64))
    
    results = ode_numpy(A_init, B_init, C_init, k1, k2, am, bm)
    
    obs = pm.Normal("obs", mu=results, sigma=sigma, observed=results_obs)

vars_list = list(model.values_to_rvs.keys())[:-1]

# Specify the sampler
sampler = "Slice Sampler"
tune = draws = 2000

# Inference!
with model:
 trace_slice = pm.sample(step=[pm.Slice(vars_list)], tune=tune, draws=draws)
trace = trace_slice
az.summary(trace)

I don’t see anything obviously wrong. I’d njit the ode_numpy function, and I wouldn’t unpack the 3 values of y (instead i’d concatenate the return from ode_update_func), but these are personal choices.

I’d also recommend you try SMC, it’s a very good option for low dimensional problems.

It was highly likely that it was simply too slow. It is now working.
Thank you for your advice.