How to troubleshoot "RuntimeWarning: divide by zero encountered in log"

gc5 · May 19, 2021, 10:04pm

Hi all,

I am running a Hierarchical Model defined as:

where the objective is to estimate, from multinomial counts (k classes) in b samples (stratified in g groups), the posterior distribution of \alpha_g, where each \alpha_g is a vector of size k of independent \Gamma random variables.
The main goal is to estimate and compare the latent \alpha_g posterior distribution between different groups (g groups), for each one of the k classes.

During instantiation of the model and sampling, I receive the following warning:

/Users/user/.pyenv/versions/3.7.4/lib/python3.7/site-packages/pymc3/distributions/transforms.py:461: RuntimeWarning: divide by zero encountered in log
  lx = np.log(x)
/Users/user/.pyenv/versions/3.7.4/lib/python3.7/site-packages/pymc3/distributions/transforms.py:463: RuntimeWarning: invalid value encountered in subtract
  y = lx[:-1] - shift

I am aware that the Gamma distributions in the model are transformed automatically using a log transformation, i.e. when printing the model I can see the following:

Also, the Gamma has support > 0, so I suppose that this is related to the warning, e.g., the log transformation of some Gamma random variables might be trying to take the log of 0 (probably related issue Error with lambda expression · Issue #303 · pymc-devs/pymc3 · GitHub).

However, I cannot understand if this warning affects the inference and/or how to take care of it. Any help on how to approach this problem would be greatly appreciated, thanks!

jonsedar · May 20, 2021, 5:33am

This isn’t the answer, but some thoughts that might help:

The error you’re seeing is thrown inside the StickBreaking class pymc3/transforms.py at d7172c0a1a76301031d1b3b411d00643c416a0c4 · pymc-devs/pymc3 · GitHub which is the transform used by the Dirichlet distribution pymc3/multivariate.py at d7172c0a1a76301031d1b3b411d00643c416a0c4 · pymc-devs/pymc3 · GitHub, which I assume is the primary source of the issue. In that definition you’ll see that a>0, so I hazard a guess that your a is going to zero.

So why might you get a=0? … If your Beta is huge, then a will get very small, but not zero. Possibly there’s a rounding issue but it seems unlikely.

It might help to simplify and test the model with a = pmExponential(... so you can further isolate

gc5 · May 20, 2021, 3:22pm

Hi @jonsedar, thank you for your answer. I think it’s a good point, and the issue arises indeed from that part of the code.

The thing I cannot understand completely is why this error appears before actual sampling, during the instantiation of the model. I checked the starting value for the \alpha_g with:

> model.named_vars['a_kg'].tag.test_value
array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

(the \alpha parameter has shape (k classes, g groups)), and the starting value should be 1 (if I understood correctly the meaning of test_value).

Regarding \alpha=0 I agree that if \beta is huge, I would expect the \alpha to get very small (by definition of Gamma mean Continuous — PyMC3 3.11.2 documentation), and possibly have some rounding issues. Why do you think that rounding issues seem unlikely, and how these issues are usually addressed with pymc3? I tried adding a small offset (but not too small, e.g. 10^{-2}) to the Dirichlet’s \alpha, but I still get the same warning.

I haven’t fully understood your last sentence. If I recall correctly, the Exponential has support [0, \infty), while the \alpha parameter of the Dirichlet should be strictly greater than 0. How would you use the exponential to troubleshoot the issue?

Thank you!

ricardoV94 · May 20, 2021, 4:49pm

Do you observe the warning when doing model.check_test_point()? I think you are seeing a pre-sampling warning when converting the test value. If that call is enough to raise the warning, can you also check if it happens deterministically?

I would expect this to raise the same warning on my end, but it does not:

with pm.Model() as m: 
    beta = pm.Gamma('beta', 1, 1)
    alpha = pm.Gamma('alpha', 1, beta, testval=np.ones((14, 2)), shape=(14, 2)) 
    theta = pm.Dirichlet('theta', alpha) 
    trace = pm.sample()

gc5 · May 20, 2021, 5:08pm

Thanks @ricardoV94 . I am not able to reproduce the warning with model.check_test_point(). I got the following result instead:

beta_log__                         -1.00
a_kg_log__                        -28.00
phi_0.sample1_stickbreaking__     -11.76
phi_1.sample2_stickbreaking__     -11.76
phi_0.sample3_stickbreaking__     -11.76
phi_1.sample4_stickbreaking__     -11.76
phi_0.sample5_stickbreaking__     -11.76
phi_1.sample6_stickbreaking__     -11.76
phi_1.sample7_stickbreaking__     -11.76
phi_0.sample8_stickbreaking__     -11.76
y_0.sample1                     -2417.09
y_1.sample2                     -1920.12
y_0.sample3                     -4130.23
y_1.sample4                     -2355.04
y_0.sample5                     -3863.84
y_1.sample6                     -2367.51
y_1.sample7                     -2371.34
y_0.sample8                     -2131.41
Name: Log-probability of test_point, dtype: float64

I wonder if any of these values are not compatible with the model.

ricardoV94 · May 20, 2021, 5:13pm

What is the shape of your phi variable?

gc5 · May 20, 2021, 5:26pm

Each \phi_b variable (there are b- one for each sample) is defined using a vector \alpha_g of length k (as there are k = 14 classes). I defined it with:

phi_gb = pm.Dirichlet(f'phi_{g}.{b}', a=a_kg[:,g], shape=len(m.classes))

For each \phi_b I use the \alpha_g prior indexed by the group g associated to the sample b: a_kg[:,g]. Each sample is associated with a unique group g.
I hope this is clear, in case I can provide additional information. Thanks!

ricardoV94 · May 20, 2021, 5:30pm

More complicated than I expected. I was just trying to replicate the code so that I could check the warning. Do you think you can share the code needed to generate the model in Colab (or here)? I don’t think the y part is needed, just until the phi.

gc5 · May 20, 2021, 5:45pm

No worries. Do you need only the code or the data too? I can share my code but without the y part I’m unsure if it would reproduce the warning.

import pymc3 as pm

classes = [f'Class{i}' for i in range(14)]
groups = [str(i) for i in range(2)]
samples = [f'sample{i}' for i in range(8)]

n_classes = len(classes)
n_groups = len(groups)
n_samples = len(samples)

# Generate a dummy mapping between samples and groups
sample_group_map = {
    b: g for b, g in zip(samples, ([str(0), str(1)] * (n_samples // 2)))
}

with pm.Model() as model:
    alpha = 1
    beta = pm.Gamma('beta', alpha=1, beta=1)
    a_kg = pm.Gamma('a_kg', alpha, beta, shape=(n_classes, n_groups))
    for b in samples:
        g = {
            c: i for i, c in enumerate(groups)
        }[sample_group_map[b]]
        # obs_b = observed.loc[b]
        # obs_b = obs_b.loc[classes].values
        phi_gb = pm.Dirichlet(f'phi_{g}.{b}', a=a_kg[:,g], shape=n_classes)
        # y_gb = pm.Multinomial(f'y_{g}.{b}', n=n[b], p=phi_gb, observed=obs_b)

ricardoV94 · May 20, 2021, 5:52pm

This part is not working. But is not very critical. By the way which pymc version are you running on?

ricardoV94 · May 20, 2021, 6:05pm

I am not able to replicate it in here: Google Colaboratory

The warning seemed to be from the stick-breaking transform forward_val method which is called pre-sampling (I think) so the data shouldn’t matter here.

gc5 · May 20, 2021, 6:05pm

Sorry, I wanted to make a minimal example but that mapping needed to be corrected.
It should be fixed now in the edited post, although I commented the part relative to the observed y.

I am running:

pymc3: 3.11.2
sys  : 3.7.4 (default, Sep 18 2019, 11:56:18) 
[Clang 10.0.1 (clang-1001.0.46.4)]

Could it be something related to pymc3 version?

ricardoV94 · May 20, 2021, 6:19pm

Does it go away if you pass start=model.test_point to pm.sample?

gc5 · May 20, 2021, 6:45pm

Thanks @ricardoV94 . I think I might have isolated the problem and also why you were not able to reproduce it.

In my code, I am actually defining a class containing helper functions for downstream analysis on the model. There might be an issue with how I used the context manager.

I define the model externally with:

m = ModelClass()
with pm.Model() as model:
    # model definition
    # ...
    
    # before exiting the context, I set the "model" attribute of the m object, i.e., m.model = model
    m.define(model=model)

Then, if I sample directly from outside the model class, i.e., using pm.sample() in the external context manager, I don’t get the warning.

# previous code
with pm.Model() as model:
    # model definition
    # ...
    
    # before exiting the context, I set the "model" attribute of the m object, i.e., m.model = model
    m.define(model=model)

    # additional part: sampling outside of the ModelClass object
    pm.sample(draws=int(5e3), tune=int(2e3), return_inferencedata=True, cores=1, target_accept=.95)

Differently, if I sample by calling a method on the ModelClass():

class ModelClass():
    def sample(
        self,
        draws: int = int(1e5),
        tune: int = int(1e3),
        **kwargs,
    ):
        with self.model:
            self.trace = pm.sample(
                draws=draws, tune=tune, return_inferencedata=True, **kwargs,
            )

by calling (externally) the method with:

m.sample(
    draws=int(5e3), tune=int(2e3), cores=1, target_accept=.95,
    start=m.model.test_point,
)

I get the warning (even setting the start to the test_point).

I am not sure what I’m doing wrong here, but it seems to be related more to how I used the Context manager than to pymc3…

ricardoV94 · May 20, 2021, 7:29pm

Interesting. I wouldn’t expect that to change the behavior but it seems it does. I am afraid you would need to go the debugger route to track it down

gc5 · May 20, 2021, 10:34pm

No worries, thanks!
I will probably post about this behavior on pymc3 github, if it’s something that might need to be addressed by the pymc3 developers. Unless there’s a simple reason why this happens, and a way to fix it.
Thanks for your help!

gc5 · May 21, 2021, 2:29pm

Turns out, it’s not a problem with pymc3. Re-analyzing more closely my code inside the ModelClass() that incorporates my model, there’s another operation that runs before sampling from the posterior, and this is the sampling from the prior predictive distribution pymc3.sampling.sample_prior_predictive (Inference — PyMC3 3.11.2 documentation).
In this function, there is no way to set the start point (probably it doesn’t make sense here). I planned to use this function to inspect the model and check for priors that are too permissive (similar to what is done here Prior and Posterior Predictive Checks — PyMC3 3.10.0 documentation).
In this case, does this RuntimeWarning means that the priors are too permissive? I am able to sample correctly from the posterior (no further errors, divergences, warnings), so should I ignore this warning? Thanks!

ricardoV94 · May 21, 2021, 2:36pm

In short, if sampling diagnostics look good I would ignore the warning during prior_predictive_sampling

gc5 · May 21, 2021, 2:38pm

Thanks for your help! I will follow your advice.

Topic		Replies	Views
What is this about?: step_methods/hmc/quadpotential.py:224: Questions	5	661	July 28, 2021
Problems when fitting a reinforcement learning model to data Questions modeling	4	510	April 20, 2022
Inconsistent graph warning when using Minibatch with DensityDist v5 modeling	2	24	August 21, 2024
Divided by zero encountered when fitting a RL model v5 modeling , aesara	4	558	July 20, 2022
Computing statistics giving runtime warnings Questions	3	1639	May 13, 2022

How to troubleshoot "RuntimeWarning: divide by zero encountered in log"

Related topics