Why does Metropolis evaluate the model twice per step?

Hi,

I was wondering why the Metropolis implementation of pymc3 performs the model forward calculation twice per step:
Running this simple example and investigating the output shows that “Executed” is printed twice per step.

import numpy as np
import pymc3 as pm
import theano.tensor as tt

from theano.compile.ops import as_op


@as_op(itypes=[tt.dscalar, tt.dscalar],
       otypes=[tt.dvector])
def test_dist(x, y):
    radius = (x - y)**3
    print("Executed")
    return np.array([radius])


with pm.Model() as test_model:

    a1 = pm.Normal("a1", mu=5, sigma=2)
    a2 = pm.Normal("a2", mu=10, sigma=2)

    a3 = test_dist(a1, a2)

    a3_obs = pm.Normal("a3_obs", mu=a3,
                       sigma=130, observed=10.0)

    step = pm.Metropolis([a1, a2], blocked=True)

    trace = pm.sample(10, tune=50, init=None, chains=1,
                      cores=1, step=step, return_inferencedata=False)

From my understanding, shouldn’t it only be computed once, and then compared to the previous result?
Or is this for some more advanced feature that goes beyond the basic Metropolis algorithm?

I also noticed that the model is evaluated once if I load an existing trace using pm.load_trace. Why is that needed?

Thanks for the help!

Hi @maxsch95 ,

there are some scenarios where caching could speed things up, but as soon as there blocked sampling (CompoundStep) involved, a caching would violate the detailed balance.
In your example there’s blocked=True, so in every iteration the CompoundStep step method iterates over all blocks (2 Metropolis steppers assigned to a1 or a2) and steps them separately, before concluding the compound/blocked step.

I also noticed that the model is evaluated once if I load an existing trace using pm.load_trace. Why is that needed?

I’m not familiar with that one and there’s a high chance that we break/remove it at some point. If you have good reasons against return_inferencedata=True please let us know so we can try to iron out the blockers :slight_smile:

cheers

Hi @michaelosthege ,

thank you for the quick response!
However, I am still a little puzzled and would like to ask some follow up questions for clarification.
First off, what I want is to perform Metropolis steps where all the parameters are updated at once instead of being updated sequentially one after another.
Could you elaborate on the detailed balance a little bit?
From my understand, at each step we compute the acceptance ratio
Metropolis
where in my case x={a1, a2} are the parameters of the previous step and x'={a1', a2'} are the new proposed parameters. P(x) is the likelihood and the priors and we already know P(x) from the previous step and need to compute P(x') which requires one forward calculation. Shouldn’t the ratios of the g’s take care of the detailed balance?

Also, you suggested that there will be two Metropolis steppers because I have two input variables a1 and a2 but I noticed the forward calculations is always performed twice, even if I have 3, or any arbitrary number really, of input parameters. If I omit the blocked keyword, I get a separate stepper for each parameter, but each stepper performs two forward calculations again.
Say I have 3 input parameters a1, a2 and a3, and an observed value a4. I will get 3 Metropolis steppers and 6 model evaluations (“Executed” printed to stdout 6 times) per step.

I am just trying to understand what is going on under the hood. I looked at the source code but it was a little overwhelming. So, would be great if you could elaborate a little bit.
Thank you!

This line creates a callable to compute the ratio: pymc3/metropolis.py at v3.11.2 · pymc-devs/pymc3 · GitHub

It is a function that takes both x' and x: pymc3/metropolis.py at v3.11.2 · pymc-devs/pymc3 · GitHub
and
pymc3/metropolis.py at v3.11.2 · pymc-devs/pymc3 · GitHub

That’s why you get two evaluations in every iteration already without blocked sampling.

Thanks again for the response!

Those links were really helpful. I now understand why it is computed twice. Does it need to be though?
logp1 in delta_logp (pymc3/metropolis.py at c1efb7ad51e34e6bd8b292ef226a7e191adbdb82 · pymc-devs/pymc3 · GitHub) obviously needs to be computed but logp0 should still be known from the previous step, right? That would mean we perform twice as many model evaluations as we actually need.

I realize this probably does not matter for most models, but my evaluation takes around 20 to 30 seconds and I want to perform as few of these calculations as possible.

As long as you step all dimensions at the same time, you can do that. One way to implement it would be with a custom class that inherits from pm.Metropolis and overrides the __init__ and step() methods.
You can keep track of the logp with a float or list attribute, for example.
But keep in mind that stepping all dimensions at the same time can be very inefficient, particularly with many dimensions and correlations.

1 Like

Thank you, that’s great. I am not doubting that it is less efficient. Do you maybe have a reference where I could read up on that?

Maybe as a general remark, the popularity of Bayesian methods in many areas of physics is increasing rapidly. However, the evaluation of models in many of those areas takes non-negligible time and can easily become the bottleneck for the analysis. So, wherever possible, caching previous results to avoid the reevaluation would be great. I don’t know if this is at all possible, just wanted to leave this as a remark.
Thanks for the help!