How to increase mxstep in "sunode"

Hello everyone,

Please forgive my beginner question.

I’m currently working on simulating very complex differential equations related to chemical reactions to determine the reaction constants.

I’m using sunode.wrappers.as_pytensor.solve_ivp() for this calculation, but I always get a lot of following warning:

[CVODES ERROR]  CVode
  At t = X.XXXXX, mxstep steps taken before reaching tout.


[CVODEA ERROR]  CVodeB
  Error occured while integrating backward problem # 0

To avoid this error, I have tried adjusting the tolerance, changing the solver, and other tweaks, but none of these have been very effective. I suspect the issue might be due to the stiffness of the original differential equations.

I’m considering increasing the mxstep parameter because I don’t mind if the simulation takes several times longer. However, with my current skills, I couldn’t figure out from the source code whether this is possible.

Any advice or guidance would be greatly appreciated.
Alternatively, if anyone knows of a better solution, I would be grateful for your guidance.
I appreciate your help.

Best regards,
battikoi

P.S. I have an additional questions. If this error occurs, what happens to the calculation?
Does the solver keep retrying the step until it accidentally reaches a trial without error?

If so, this means that the simulation is affected by problem-dependent errors.
In that case, it seems like sampling could give incorrect results, or am I understanding this incorrectly?

1 Like

CC @aseyboldt

1 Like

I encounter similar errors just running the example from github/the docs (and an error about how the RHS routine failed at first call). In the end the sampling does seem to complete, but it would be great if someone can provide some guidance.

I add this because it may be helpful to someone.

After further trying, I confirmed that the problem improved by changing the solver from BDF to ADAMS and reducing the absolute and relative tolerance from the default 10^-10 to 10^-14. (Many other libraries that use Sundials have also proposed this (lower tolerance) solution for the same error.)

However, the problem still remains. In my model, hundreds of errors are occurring per one draw, but there is no room to further reduce the tolerance.

I have summarized my problems.

  1. The sampling time is very long, probably due to a large number of errors. I would like to try to improve it by extending mxstep. (If this is not possible, I plan to consider simplifying the differential equation. I suspect I’ll end up doing this anyway, because of the nature of the equation.)

  2. Points where the calculation fails are never taken into chains, so isn’t it possible that the posterior distribution has a strange clipping range?

I apologize for writing repeatedly.
I will continue to work on improving it myself, but I would appreciate any help from the community, thank you.

Hello battikoi

Please forgive my beginner question.

Not at all, those are good questions :slight_smile:

Each time the error occurs, the the solver failed to solve an ode, because it had to make too many very small steps, and couldn’t make a lot of progress. The solve_ivp will then just return NaNs as result.

The sampler handles this by stopping the trajectory it was currently computing (for each sample it solves several ODEs, typically somewhere between 3 and 1000, for a nice visualization of what it is doing see for instance MCMC Interactive Gallery). The next position will be a draw from the trajectory it built up to that point. All solver failures will be reported as a divergence in the sampler stats.

So first off, this might not be a problem at all. If those divergences only happen early during tuning for instance, it might just be that while the sampler was trying to find the typical set, it ended up trying to solve the ODE for very strange parameters, but everything that’s in the typical set (so anything you end up seeing as posterior draws) could be perfectly fine. So if you don’t see any divergences in your posterior, I’d say don’t worry about this. You can check the total number of divergences in your posterior using trace.sample_stats.diverging.sum(). But not all divergences have to be due to solver issues, they could also be due to bad posterior geometry.

But if this keeps happening after tuning, this is in fact an issue, and could very well bias your posterior estimates.

In that case I think this indicates a problem of the ODE, and I wouldn’t really expect that increasing mxstep would help with either of those. Reasons for that could be

  • The problem is stiff but the solver is built for non-stiff problems. You mentioned you tried to change the solver, so I guess that’s not it.
  • For some parameter values the ODE is not well defined. This would require changes to the ODE itself.
  • For some parameter values the ODE is so ill conditioned, that the solver can’t deal with it anymore (given the tolerances at least). Maybe you are lucky and those parameter values also don’t make much sense (things that are ill conditioned coincide surprisingly often with unrealistic things in my experience :wink: ). In that case maybe your parametrization or prior could be changed? Maybe you can rescale/reformulate the ODE in some way that makes it better behaved?
  • Numerical issues. ie underflows/overflows. Maybe rescaling or log-transforming some states or parameters can help?
  • The jacobian is incorrect. This would be a sunode bug.

You can also just increase mxstep, and maybe for some parameters it just happens to be bad, but not too terrible that it can’t deal with it anymore? I’ll make a PR to sunode to make it easier to change that parameter.

4 Likes

Thank you for your detailed advice, @aseyboldt
If it is not too much trouble, I would like to pose a few additional questions.

  • The problem is stiff but the solver is built for non-stiff problems. You mentioned you tried to change the solver, so I guess that’s not it.

I understand that currently sunode uses CVODE and therefore only supports ADAMS and BDF solvers. Is that correct?

You can check the total number of divergences in your posterior using trace.sample_stats.diverging.sum() .

I tried this method and got an answer of 42 for 200 draws in the test.
Does this mean that 42 of the 200 draws are meaningless? Or does it simply mean that 42 draws were discarded and redone?
(In other words, does this value theoretically grow to infinity?)

Also, thanks for the explanation. I definitely misunderstood about NUTS.

I understand now that even if solve_ivp() encounters an mxstep error, the trajectory continues from the last step. My model is a little high-dimensional, so as you say, it is quite possible that most of the variable sets are strange sets that result in a large number of ODE errors. I can expect the final posterior distribution to be within a reasonable range. It may also be that the prior distribution is set too loosely for the high dimensionality of the problem.

As a test, I ran very short tuning and drawing runs, and that may have been the problem. In my model, I need to solve the ODE 10 times for 10 different initial conditions for one drawing, so the error rate may accumulate and cause a lot of divergence.

After checking the distribution of error t values, I found that increasing mxstep by 10 times may halve the error, so I plan to try mxstep adjustment.

Although my skill level does not allow me to contribute to the library, I am looking forward to improvements in sunode.

I understand that currently sunode uses CVODE and therefore only supports ADAMS and BDF solvers. Is that correct?

Yes, that’s correct. ADAMS is for non-stiff problems, BDF for stiff problems.

Does this mean that 42 of the 200 draws are meaningless? Or does it simply mean that 42 draws were discarded and redone?

It means that in 42 out of 200 draws it ran into an issue while it was solving one of the ODEs in your model, or the geometry of the posterior gave it trouble.

I’m afraid this can be much worse than just that those draws are invalid, the whole posterior could be invalid in this case. Let’s assume the problem is that the ode solver always fails if some parameter alpha is smaller than 1 (in practice this is probably much more complicated). It could very well be that pretty much all of the posterior mass should be for alpha values smaller than 1. But each time the sampler tries to explore that space, it fails because of the solver failure. So you end up with all draws with alpha > 1, and lots of divergences.

Having divergences breaks some of the assumptions we need to show that the sampler will converge to the true posterior at some point, so they can really be a big issue.

As a test, I ran very short tuning and drawing runs, and that may have been the problem.

Yes, that’s usually not helpful. If the number of tuning steps is too small the mass matrix and the step sizes will be off, and nothing will work properly.

After checking the distribution of error t values, I found that increasing mxstep by 10 times may halve the error, so I plan to try mxstep adjustment.

I’m not sure what you mean here. How can you estimate the effect of increasing mxstep if you can’t increase mxstep?

Thank you for your answer.

Yes, that’s usually not helpful. If the number of tuning steps is too small the mass matrix and the step sizes will be off, and nothing will work properly.

I was very wrong on this point, and now I understand that it’s possible that the poor tuning results were causing us to keep exploring the wrong space, resulting in errors due to the poor set of variables.

So, I understand now that one thing I can do at this point is,

  1. believe that the solver works in the vicinity of the posterior distribution I want to obtain.
  2. spend enough time tuning and simulating.

I’m not sure what you mean here. How can you estimate the effect of increasing mxstep if you can’t increase mxstep?

The error message included the value of t at which mxstep was used up. So I retrieved the t values from the numerous error messages and checked their histograms, and found that most of the errors occurred at around 1/100 of the maximum value of t that I had set. There were not many examples of termination in the small range below t = 10^0 (the range of t in my model is approximately 10^2~10^3.).

In chemical reaction ODEs, the change is large early on and becomes smaller later, so the step size is expected to increase. Therefore, I expect that the error would not have occurred if I had increased mxstep by a factor obtained by dividing the maximum value of t that I had set, by the value of t at the time of the error termination.

\text{factor} = \frac{t_{\text{set_max}}}{t_{\text{error}}}

Based on this expectation and the histogram, I estimated that even with a pessimistic prediction, about 50% of the errors would be eliminated if I multiplied mxstep by 10. Of course, the situation may get worse due to the multiple ODE simulations in one draw, or it may get better due to the effect of the step size.

I’m afraid that I might be mistaken about this point as well.
Do the t values in the error messages correspond to the original ODE’s t values? Or am I making some fatal mistake?

I read the source code and readme and did some trial and error.

As a result, I solved it by creating a monkey patch to sunode.wrappers.as_pytensor.solve_ivp.
Also, I have seen that this dramatically reduces error terminations in the integral (over 90% reduction).

I think this is because the probability density is highly skewed with respect to the parameter space.

For other beginners who are having trouble, I’ll copy and paste my code. It’s a messy and terrible (especially import), but I hope it helps someone.

FYI, @aseyboldt.
(But please let me know if there seems to be a problem.)

import numpy as np

import pymc as pm
import pytensor
import pytensor.tensor as pt
import matplotlib.pyplot as plt
from scipy.integrate import odeint
import pandas as pd
from pytensor import function

import pytensor.tensor as pt
from pytensor.graph.fg import MissingInputError
from pytensor.graph.op import Op
from pytensor.graph.basic import Constant, Variable
from pytensor.gradient import grad_not_implemented
from typing import Dict, Optional, Any, Callable

import sympy as sym
import sunode
from sunode import basic, symode, solver
from sunode.basic import CPointer, ERRORS, lib, ffi, check, check_ptr, Borrows, check_code
import sunode.wrappers.as_pytensor
from sunode.wrappers.as_pytensor import EvalRhs, SolveODE, SolveODEAdjoint, SolveODEAdjointBackward, solve_ivp
from sunode.solver import SolverError
from sunode.dtypesubset import as_flattened
import sunode.symode.problem
from sunode.symode.problem import SympyProblem

###monkey patch for solve_ivp to increase maxstep
def new_solve_ivp(
    t0: float,
    y0: np.ndarray,
    params: Dict[str, Any],
    tvals: np.ndarray,
    rhs: Callable[[sym.Symbol, np.ndarray, np.ndarray], Dict[str, Any]],
    derivatives: str = 'adjoint',
    coords: Optional[Dict[str, pd.Index]] = None,
    make_solver=None,
    derivative_subset=None,
    solver_kwargs=None,
    simplify=None,
) -> Any:
    dtype = basic.data_dtype
    mxstep = 500000 # set mxstep (Initial value is 5000
    if solver_kwargs is None:
        solver_kwargs={}

    if derivatives == "forward":
        params = params.copy()
        params["__initial_values"] = y0

    def read_dict(vals, name=None):
        if isinstance(vals, dict):
            return {name: read_dict(item, name) for name, item in vals.items()}
        else:
            if isinstance(vals, tuple):
                tensor, dim_names = vals
            else:
                tensor, dim_names = vals, pt.as_tensor_variable(vals, dtype="float64").type.shape
                if any(d is None for d in dim_names):
                    raise ValueError(
                        'Shapes of tensors need to be statically known or given explicitly.'
                    )
            if isinstance(dim_names, (str, int)):
                dim_names = (dim_names,)
            tensor = pt.as_tensor_variable(tensor, dtype="float64")
            if tensor.ndim != len(dim_names):
                raise ValueError(
                    f"Dimension mismatch for {name}: Value has rank {tensor.ndim}, "
                    f"but {len(dim_names)} was specified."
                )
            assert np.dtype(tensor.dtype) == dtype, tensor
            tensor_dtype = np.dtype(tensor.dtype)
            if tensor_dtype != dtype:
                raise ValueError(
                    f"Dtype mismatch for {name}: Got {tensor_dtype} but expected {dtype}."
                )
            return dim_names

    y0_dims = read_dict(y0)
    params_dims = read_dict(params)

    if derivative_subset is None:
        derivative_subset = []
        for path, val in as_flattened(params).items():
            if isinstance(val, tuple):
                tensor, _ = val
            else:
                tensor = val
            if isinstance(tensor, Variable):
                if not isinstance(tensor, Constant):
                    derivative_subset.append(path)

    problem = symode.problem.SympyProblem(
        params_dims, y0_dims, rhs, derivative_subset, coords=coords, simplify=simplify)

    flat_tensors = as_flattened(params)
    vars = []
    for path in problem.params_subset.subset_paths:
        tensor = flat_tensors[path]
        if isinstance(tensor, tuple):
            tensor, _ = tensor
        vars.append(pt.as_tensor_variable(tensor, dtype="float64").reshape((-1,)))
    if vars:
        params_subs_flat = pt.concatenate(vars)
    else:
        params_subs_flat = pt.as_tensor_variable(np.zeros(0), dtype="float64")

    vars = []
    for path in problem.params_subset.remainder.subset_paths:
        tensor = flat_tensors[path]
        if isinstance(tensor, tuple):
            tensor, _ = tensor
        vars.append(pt.as_tensor_variable(tensor, dtype="float64").reshape((-1,)))
    if vars:
        params_remaining_flat = pt.concatenate(vars)
    else:
        params_remaining_flat = pt.as_tensor_variable(np.zeros(0), dtype="float64")

    flat_tensors = as_flattened(y0)
    vars = []
    for path in problem.state_subset.paths:
        tensor = flat_tensors[path]
        if isinstance(tensor, tuple):
            tensor, _ = tensor
        vars.append(pt.as_tensor_variable(tensor, dtype="float64").reshape((-1,)))
    y0_flat = pt.concatenate(vars)

    t0 = pt.as_tensor_variable(t0, dtype="float64")
    tvals = pt.as_tensor_variable(tvals, dtype="float64")

    if derivatives == 'adjoint':
        sol = solver.AdjointSolver(problem, **solver_kwargs)
        lib.CVodeSetMaxNumSteps(sol._ode, mxstep) # mxstep setting
        print("patched!")
        wrapper = SolveODEAdjoint(sol)
        flat_solution = wrapper(y0_flat, params_subs_flat, params_remaining_flat, t0, tvals)
        solution = problem.flat_solution_as_dict(flat_solution)
        return solution, flat_solution, problem, sol, y0_flat, params_subs_flat
    elif derivatives == 'forward':
        if not "sens_mode" in solver_kwargs:
            raise ValueError("When `derivatives=True`, the `solver_kwargs` must contain one of `sens_mode={\"simultaneous\" | \"staggered\"}`.")
        sol = solver.Solver(problem, **solver_kwargs)
        lib.CVodeSetMaxNumSteps(sol._ode, mxstep) # mxstep setting
        print("patched!")
        wrapper = SolveODE(sol)
        flat_solution, flat_sens = wrapper(y0_flat, params_subs_flat, params_remaining_flat, t0, tvals)
        solution = problem.flat_solution_as_dict(flat_solution)
        return solution, flat_solution, problem, sol, y0_flat, params_subs_flat, flat_sens, wrapper
    elif derivatives in [None, False]:
        sol = solver.Solver(problem, sens_mode=False)
        lib.CVodeSetMaxNumSteps(sol._ode, mxstep) # mxstep setting
        print("patched!")
        assert False

sunode.wrappers.as_pytensor.solve_ivp = new_solve_ivp
### patch end