Debug mode in PyTensor

opherdonchin · April 26, 2024, 11:50am

When sampling fails, it is common to get this error message:

HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

Strangely, there is no code snippet available that shows how to actually do this.

Here is some sample code that will produce this error:

import pymc as pm
import numpy as np

y_data = np.random.normal(0, 1, 30)

with pm.Model() as m:
    s = pm.HalfNormal('s', sigma=-1)
    m = pm.Normal('m', mu=0, sigma=1)
    y = pm.Normal('y', mu=m, sigma=s, observed=y_data)
    
    idata = pm.sample_prior_predictive(samples=100)

jessegrabowski · April 26, 2024, 12:22pm

You can see the docs for debugging pytensor here. This includes how to set debug mode. The only extra thing you need to know is that since the function compilation happens inside sample, you need to set a context manager like this:

with pytensor.config.change_flags({'mode':'DEBUG_MODE'}):
    idata = pm.sample()

In general I don’t find this particularly helpful, though. If you read through the output carefully, you should find a more informative error. In your case I actually get a ValueError because sigma is negative, not a pytensor error.

I’ll spoil it and tell you that 99% of PyMC errors are just shape errors

ricardoV94 · April 26, 2024, 3:31pm

Try model.debug() and let us know if it would have helped you figure out the problem

opherdonchin · April 27, 2024, 8:24pm

Thanks for responding!

I’ve looked through the PyTensor debugging page in the past. It seems pretty involved. I shied away from it because all of my code is at the level of PyMC.

It doesn’t tell me how to “re-run with most PyTensor optimizations disabled” or “set the PyTensor flag ‘optimizer=fast_compile’”, although I glean from your response that I can use pytensor.config.change_flags within a context to do that.

Part of my concern is in teaching. I teach a Bayesian course and last year I moved to PyMC, but I’m unsure about how deeply into PyTensor the students need to go. The more I use PyMC, the more it seems like you can’t really be effective without a fairly deep understanding of PyTensor.

Opher

jessegrabowski · April 27, 2024, 8:32pm

Did you try model.debug() as @ricardoV94 suggested? This should reveal the most common problems in models without having to interact with the backend. I also didn’t even get a pytensor error on your example. What kinds of models are you planning to show students that you’re getting errors on?

ricardoV94 · April 27, 2024, 8:36pm

PyTensor debug is of little use to PyMC users because users are not responsible for most of the graphs that PyMC builds under the hood for them.

You are absolutely correct to be puzzled (and even upset) by the default error message. It’s just that PyMC is a user of PyTensor and a very niche one at that.

That’s why we emphasize tools like model.debug or just doing pm.draw(var) to confirm whether shapes/values make sense as it’s often just shapes or bad parameter values, which shouldn’t require any deep understanding of PyTensor.

ricardoV94 · April 27, 2024, 8:40pm

As a general programming tip: build slow and incrementally. Tell your students to test each line they write and try to call pm.draw or pm.sample and see if it errors out (at least early on). It’s much easier to correct yourself because the number of culprits is much smaller. If you write 50x lines of code, call pm.sample and find yourself with an obscure PyTensor error it’s “already too late”.

In your example you would quickly figure out things only fail when either s or y is added, and then only when sigma is specified… and then you found out the problem without even looking at the traceback.

opherdonchin · April 30, 2024, 11:19am

Thanks for all this. model.debug() is ceertainly helpful.

I’m not sure about the advice regarding building up models slowly. It’s oft repeated and I say it to my students, but in my own coding I find that I really need to write down a fullish model to start thinking clearly about how to simplify it rather than the other way around.

I also think that a lot of model errors come from the sampling somehow entering a regime which is disallowed by the priors, or maybe bad initial conditions. For this, it would be enormously helpful to be able to figure out which node was generating the problem. I understand this is difficult because of PyTensor optimization, but it would be nice if there was an obvious way to produce an un-optimized model where you could get good debugging information.

Finally, about PyTensor: I’m actually just deciding that I need to introduce more material on PyTensor and XArray into the course. PyMC is very hard to use seriously without some comfort with PyTensor and similarly for Arviz and XArray. At least, that’s what I’m finding.

Opher

ricardoV94 · April 30, 2024, 11:29am

This shouldn’t be the case by default, as all default distributions have transformations so that whatever the sampler proposes gets mapped into a valid value. The problem in your example was not on the sampling not respecting the prior but the prior being wrong for the relevant parameter.

The going line by line strategy can be done at the model level.

Model with only likelihood with fixed parameters (can’t do mcmc sample with it but can do prior predictive)
Model with likelihood and pooled mean prior
Model with likelihood and pooled mean / sigma prior
Mean varying with coefficients * covariates
Coefficients varying by subjects (no pooling)
Coefficients varying by subject hierarchical (partial pooling)
…

At each step you have a fully specified and consistent model that is a simpler version of what comes next. This is not useful only for debugging but model building itself. Once you’re familiar with this you can obviously jump levels, but when you find a problem being able to go back the slow way can be really helpful

ricardoV94 · April 30, 2024, 11:32am

Having said that, there’s nothing wrong with teaching PyTensor/ Xarray, I would just be careful to do that very gradual so as to not overload students. Obviously depends on their predisposition

ricardoV94 · April 30, 2024, 11:35am

Finally, getting back to the PyTensor tracebak, the relevant lines you want to zoom in when you see such an error are these (I reran your first example):

 File "/home/ricardo/miniconda3/envs/colgate-shelf-sow2/lib/python3.11/site-packages/pytensor/tensor/random/basic.py", line 375, in rng_fn_scipy
    return stats.halfnorm.rvs(loc, scale, random_state=rng, size=size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ricardo/miniconda3/envs/colgate-shelf-sow2/lib/python3.11/site-packages/scipy/stats/_distn_infrastructure.py", line 1057, in rvs
    raise ValueError(message)
ValueError: Domain error in arguments. The `scale` parameter must be positive for all distributions, and many distributions have restrictions on shape parameters. Please see the `scipy.stats.halfnorm` documentation for details.
Apply node that caused the error: halfnormal_rv{0, (0, 0), floatX, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FA1EC6DBD80>), [], 11, 0.0, -1)

Which you can reproduce manually

import scipy.stats as st

st.halfnorm.rvs(0, -1)

ricardoV94 · April 30, 2024, 12:12pm

@opherdonchin I opened a PR to keep the stack trace to the original variables, this shows exactly which line of code created the faulty RV: Keep stack trace in random_make_inplace by ricardoV94 · Pull Request #735 · pymc-devs/pytensor · GitHub

The full trackeback is now more verbose, and looks like this:

Traceback (most recent call last):
  File "/home/ricardo/Documents/Projects/pytensor/pytensor/compile/function/types.py", line 970, in __call__
    self.vm()
  File "/home/ricardo/Documents/Projects/pytensor/pytensor/graph/op.py", line 524, in rval
    r = p(n, [x[0] for x in i], o)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ricardo/Documents/Projects/pytensor/pytensor/tensor/random/op.py", line 330, in perform
    smpl_val = self.rng_fn(rng, *([*args, size]))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ricardo/Documents/Projects/pytensor/pytensor/tensor/random/basic.py", line 58, in rng_fn
    res = cls.rng_fn_scipy(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ricardo/Documents/Projects/pytensor/pytensor/tensor/random/basic.py", line 375, in rng_fn_scipy
    return stats.halfnorm.rvs(loc, scale, random_state=rng, size=size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ricardo/miniconda3/envs/pymc/lib/python3.11/site-packages/scipy/stats/_distn_infrastructure.py", line 1057, in rvs
    raise ValueError(message)
ValueError: Domain error in arguments. The `scale` parameter must be positive for all distributions, and many distributions have restrictions on shape parameters. Please see the `scipy.stats.halfnorm` documentation for details.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ricardo/miniconda3/envs/pymc/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-9b04c63d3b82>", line 11, in <module>
    idata = pm.sample_prior_predictive(samples=100)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ricardo/Documents/Projects/pymc/pymc/sampling/forward.py", line 417, in sample_prior_predictive
    values = zip(*(sampler_fn() for i in range(samples)))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ricardo/Documents/Projects/pymc/pymc/sampling/forward.py", line 417, in <genexpr>
    values = zip(*(sampler_fn() for i in range(samples)))
                   ^^^^^^^^^^^^
  File "/home/ricardo/Documents/Projects/pytensor/pytensor/compile/function/types.py", line 983, in __call__
    raise_with_op(
  File "/home/ricardo/Documents/Projects/pytensor/pytensor/link/utils.py", line 528, in raise_with_op
    raise exc_value.with_traceback(exc_trace)
  File "/home/ricardo/Documents/Projects/pytensor/pytensor/compile/function/types.py", line 970, in __call__
    self.vm()
  File "/home/ricardo/Documents/Projects/pytensor/pytensor/graph/op.py", line 524, in rval
    r = p(n, [x[0] for x in i], o)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ricardo/Documents/Projects/pytensor/pytensor/tensor/random/op.py", line 330, in perform
    smpl_val = self.rng_fn(rng, *([*args, size]))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ricardo/Documents/Projects/pytensor/pytensor/tensor/random/basic.py", line 58, in rng_fn
    res = cls.rng_fn_scipy(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ricardo/Documents/Projects/pytensor/pytensor/tensor/random/basic.py", line 375, in rng_fn_scipy
    return stats.halfnorm.rvs(loc, scale, random_state=rng, size=size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ricardo/miniconda3/envs/pymc/lib/python3.11/site-packages/scipy/stats/_distn_infrastructure.py", line 1057, in rvs
    raise ValueError(message)
ValueError: Domain error in arguments. The `scale` parameter must be positive for all distributions, and many distributions have restrictions on shape parameters. Please see the `scipy.stats.halfnorm` documentation for details.
Apply node that caused the error: halfnormal_rv{0, (0, 0), floatX, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FD4400504A0>), [], 11, 0.0, -1)
Toposort index: 1
Inputs types: [RandomGeneratorType, TensorType(int64, shape=(0,)), TensorType(int64, shape=()), TensorType(float32, shape=()), TensorType(int8, shape=())]
Inputs shapes: ['No shapes', (0,), (), (), ()]
Inputs strides: ['No strides', (0,), (), (), ()]
Inputs values: [Generator(PCG64) at 0x7FD4400504A0, array([], dtype=int64), array(11), array(0., dtype=float32), array(-1, dtype=int8)]
Outputs clients: [['output'], ['output', normal_rv{0, (0, 0), floatX, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FD440052B20>), [30], 11, m, s)]]

Backtrace when the node is created (use PyTensor flag traceback__limit=N to make it longer):
  File "/home/ricardo/miniconda3/envs/pymc/lib/python3.11/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
    coro.send(None)
  File "/home/ricardo/miniconda3/envs/pymc/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3334, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "/home/ricardo/miniconda3/envs/pymc/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3517, in run_ast_nodes
    if await self.run_code(code, result, async_=asy):
  File "/home/ricardo/miniconda3/envs/pymc/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-9b04c63d3b82>", line 7, in <module>
    s = pm.HalfNormal('s', sigma=-1)
  File "/home/ricardo/Documents/Projects/pymc/pymc/distributions/distribution.py", line 555, in __new__
    rv_out = cls.dist(*args, **kwargs)
  File "/home/ricardo/Documents/Projects/pymc/pymc/distributions/continuous.py", line 846, in dist
    return super().dist([0.0, sigma], **kwargs)
  File "/home/ricardo/Documents/Projects/pymc/pymc/distributions/distribution.py", line 635, in dist
    rv_out = cls.rv_op(*dist_params, size=create_size, **kwargs)

HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

It shows it all starts from s = pm.HalfNormal('s', sigma=-1). I still find these very hard to read, but anyway the info will be there.

opherdonchin · May 7, 2024, 5:53am

Thanks. That’s helpful!

Topic		Replies	Views
"<<! BUG IN FGRAPH.REPLACE OR A LISTENER !>>" during mcmc sampling, but sampling finishes correctly v5 sampling , pytensor	3	335	July 5, 2023
Error: Chain 0 failed with: 'Scratchpad' object has no attribute 'ufunc' version agnostic	4	583	November 22, 2023
AssertionError on pytensor v5	6	75	November 10, 2024
Why would scipy/numpy wrapped in @as_op cause much faster sampling than using pytensor operations? Questions	10	997	April 29, 2023
Model Fits with no issues but getting an error when sampling ppc v5 modeling	4	376	July 11, 2023

Debug mode in PyTensor

Related topics