Why are low level helper functions like log1pexp and log1mexp needed?

This is just out of curiosity.

My understanding is that Theano is supposed to optimize the pymc3 graph so that users (or developers) don’t have to worry about low level details like numerical optimizations or “mathematical tricks”. My question then is: what is the point of helper funtions like log1pexp and log1mexp in the math submodule? Is this not one of the things that Theano should be able to do on its own? Does it save compilation time? Is it to help translate code from other libraries such as scipy that make use of such helper methods?

Thank you for your attention :slight_smile:

Yeah we could totally implement optimization rule to do that automatically - cc @brandonwillard I think it is worth to put on our road map.

Thanks for the reply.

Related to that, how can one know what kind of operations are already optimized and which ones are not?

log1pexp is essentially a renaming of theano.tensor.nnet.softplus–albeit one that is better accomplished with something as simple as from theano.tensor.nnet import softplus as log1pexp–and log1mexp looks like a type of approximation that’s probably intended to speed up computations.

That said, log1pexp is just an existing Op, so there’s no rewrite (i.e. “optimization”) to be made for/from it. Otherwise, there are already “specialization” and “stabilization” rewrites for exactly these kinds of log operations; they can be found in the theano.tensor.opt and theano.tensor.nnet.sigm modules. For instance, theano.tensor.nnet.sigm contains some “UltraFast” Ops that are highly reminiscent of log1mexp, and there are local_optimizers defined that replace standard sigmoid Ops with those “faster” versions.

Note that certain rewrites need/should not always be direct. A given rewrite can sometimes be decomposed into “smaller” independent rewrites, so, if you’re looking for a specific rewrite result, attempt to understand how it might’ve been broken down or–better yet–simply run a set of rewrites on a test graph to see whether or not a desired/expected rewrite is produced.

1 Like

Hi @brandonwillard, thanks for the write-up.

This is still a bit above my head, but hopefully you can push me in the right direction. What I want to understand is why do we need something like the nnet.softplus or log1mexp or whatever from a user perspective? Why can’t theano simply substitute into place it when it sees the equivalent “naive” mathematical sequence that the user would be prone to write? My understanding was that this was supposed to be one of the advantages of using Theano (i.e. the user does not need to worry about choosing operators that might be faster or more stable than the naive formula, as Theano will take care of that).

Is it because there are different optimizations with different trade-offs, and therefore we cannot have a one size fits all? Is it because it is faster to do it in advance than having Theano figure it out everytime the graph is compiled? Or is it because this was not even a goal for Theano in the first place? Or something else?

Does my question make sense?

Thanks for your attention :slight_smile:

1 Like

That could be the case, but, to start, let’s find out exactly which rewrites we’re talking about by constructing a few concrete examples. From there we might also find that a rewrite is simply missing, but it’s hard to say otherwise.

Yes, it is good to avoid having to perform rewrites by promoting better graphs at the user level, but that alone isn’t a reason to omit a rewrite.

Rewrites are a main feature of Theano, so I’m not sure that I understand this question.

Thanks again for your reply @brandonwillard

So to make my question concrete. Is there any difference between writing the following theano “model”?

y = pm.math.log1pexp(x)  # or simply use the softflus directly

and

y = tt.log(1 + tt.exp(x))

From a user standpoint, I would prefer to always write my code as in the second example, and not have to worry about possible optimization methods like log1pexp that might be available in the library. But since PyMC is exposing such methods in their public API, I assume that there is a benefit to using them directly. What would be the benefit?

1 Like

The former function, log1pexp, utilizes a custom Op, and those–just like specialized NumPy functions–will often perform seemingly standard operations with a lot more care. In other words, the two expressions you’re comparing can produce quite different results in certain cases.

Yes, one of the main purposes of Theano’s rewrites/“optimizations” is to make such replacements automatically.

What I was saying is that I don’t know if the exact rewrites you’re considering are all implemented right now. If some are missing, then we can add them, but we need to be smart about how we add them–especially given the rewrites that are already present.

Here’s a quick way to determine experimentally which optimizations are present (in fast_run mode, at least):

import theano.tensor as tt

from theano import config
from theano.gof.graph import inputs as tt_inputs
from theano.gof.fg import FunctionGraph
from theano.gof.optdb import Query
from theano.compile import optdb

from theano.printing import debugprint as tt_dprint


# We don't need to waste time compiling graphs to C
config.cxx = ""


def optimize_graphs(*graphs, include=["fast_run"], **kwargs):
    inputs = tt_inputs(graphs)
    graphs = list(graphs)
    fgraph = FunctionGraph(inputs, graphs, clone=False)
    canonicalize_opt = optdb.query(Query(include=include, **kwargs))
    _ = canonicalize_opt.optimize(fgraph)
    return graphs


x = tt.vector()
y = tt.log(1 + tt.exp(x))
>>> # This will tell us which rewrites were used
>>> with config.change_flags(optimizer_verbose=True):
>>>     # Perform the rewrites
>>>     (z,) = optimize_graphs(y)
local_log1p Elemwise{log,no_inplace}.0 Elemwise{second,no_inplace}.0
local_fill_to_alloc Elemwise{second,no_inplace}.0 Elemwise{log1p,no_inplace}.0
Elemwise{log1p,no_inplace}(Elemwise{exp,no_inplace}(x)) -> softplus(x) Elemwise{log1p,no_inplace}.0 softplus.0
...
inplace_elemwise_optimizer softplus.0 Elemwise{ScalarSoftplus}[(0, 0)].0

>>> tt_dprint(z)
Elemwise{ScalarSoftplus}[(0, 0)] [id A] ''   
 |<TensorType(float64, vector)> [id B]

The output tells us that tt.log(1 + tt.exp(x)) is rewritten to tt.nnet.softplus(x), as desired.

2 Likes

@brandonwillard That is really helpful and it’s how I imagined Theano would work.

This leaves me with my other question. Is there any practical benefit of using softplus directly? Can I trust Theano to always figure out the optimization? Does it lead to any noticeable difference in how fast a graph is optimized/compiled? Or perhaps some other advantage?

Thanks a lot for your input!

It does remove the need to perform the rewrite, which can save some time, but–more generally–these optimizations cannot completely remove the user’s responsibility to specify good, computable graphs.

No. Again, these rewrites do not absolve the user of their need to know how things work. At a certain point, users will need to work directly with these rewrites if they want to produce more efficient graphs. The best we can do is offer a reasonable set of default rewrites that do not cost too much to run.

Just like any other compiler with “optimizations”, there are limitations to what can be automated, and we can’t assume we’ll be able to cover everything. What we can do is provide a programmable platform for people who want to do and share this kind of work in Python.

These are very context-dependent questions that are best answered experimentally–and with profiling.

1 Like

Thank you very much @brandonwillard. I think you have answered all the questions that were puzzling me.