Hi,
I was wondering why the Metropolis implementation of pymc3 performs the model forward calculation twice per step:
Running this simple example and investigating the output shows that “Executed” is printed twice per step.
import numpy as np
import pymc3 as pm
import theano.tensor as tt
from theano.compile.ops import as_op
@as_op(itypes=[tt.dscalar, tt.dscalar],
otypes=[tt.dvector])
def test_dist(x, y):
radius = (x - y)**3
print("Executed")
return np.array([radius])
with pm.Model() as test_model:
a1 = pm.Normal("a1", mu=5, sigma=2)
a2 = pm.Normal("a2", mu=10, sigma=2)
a3 = test_dist(a1, a2)
a3_obs = pm.Normal("a3_obs", mu=a3,
sigma=130, observed=10.0)
step = pm.Metropolis([a1, a2], blocked=True)
trace = pm.sample(10, tune=50, init=None, chains=1,
cores=1, step=step, return_inferencedata=False)
From my understanding, shouldn’t it only be computed once, and then compared to the previous result?
Or is this for some more advanced feature that goes beyond the basic Metropolis algorithm?
I also noticed that the model is evaluated once if I load an existing trace using pm.load_trace. Why is that needed?
Thanks for the help!
Hi @maxsch95 ,
there are some scenarios where caching could speed things up, but as soon as there blocked sampling (CompoundStep
) involved, a caching would violate the detailed balance.
In your example there’s blocked=True
, so in every iteration the CompoundStep
step method iterates over all blocks (2 Metropolis
steppers assigned to a1
or a2
) and steps them separately, before concluding the compound/blocked step.
I also noticed that the model is evaluated once if I load an existing trace using pm.load_trace. Why is that needed?
I’m not familiar with that one and there’s a high chance that we break/remove it at some point. If you have good reasons against return_inferencedata=True
please let us know so we can try to iron out the blockers 
cheers
Hi @michaelosthege ,
thank you for the quick response!
However, I am still a little puzzled and would like to ask some follow up questions for clarification.
First off, what I want is to perform Metropolis steps where all the parameters are updated at once instead of being updated sequentially one after another.
Could you elaborate on the detailed balance a little bit?
From my understand, at each step we compute the acceptance ratio

where in my case x={a1, a2}
are the parameters of the previous step and x'={a1', a2'}
are the new proposed parameters. P(x)
is the likelihood and the priors and we already know P(x)
from the previous step and need to compute P(x')
which requires one forward calculation. Shouldn’t the ratios of the g’s take care of the detailed balance?
Also, you suggested that there will be two Metropolis steppers because I have two input variables a1
and a2
but I noticed the forward calculations is always performed twice, even if I have 3, or any arbitrary number really, of input parameters. If I omit the blocked
keyword, I get a separate stepper for each parameter, but each stepper performs two forward calculations again.
Say I have 3 input parameters a1
, a2
and a3
, and an observed value a4
. I will get 3 Metropolis steppers and 6 model evaluations (“Executed” printed to stdout 6 times) per step.
I am just trying to understand what is going on under the hood. I looked at the source code but it was a little overwhelming. So, would be great if you could elaborate a little bit.
Thank you!
This line creates a callable to compute the ratio: pymc3/metropolis.py at v3.11.2 · pymc-devs/pymc3 · GitHub
It is a function that takes both x'
and x
: pymc3/metropolis.py at v3.11.2 · pymc-devs/pymc3 · GitHub
and
pymc3/metropolis.py at v3.11.2 · pymc-devs/pymc3 · GitHub
That’s why you get two evaluations in every iteration already without blocked sampling.
Thanks again for the response!
Those links were really helpful. I now understand why it is computed twice. Does it need to be though?
logp1
in delta_logp
(pymc3/metropolis.py at c1efb7ad51e34e6bd8b292ef226a7e191adbdb82 · pymc-devs/pymc3 · GitHub) obviously needs to be computed but logp0
should still be known from the previous step, right? That would mean we perform twice as many model evaluations as we actually need.
I realize this probably does not matter for most models, but my evaluation takes around 20 to 30 seconds and I want to perform as few of these calculations as possible.
As long as you step all dimensions at the same time, you can do that. One way to implement it would be with a custom class that inherits from pm.Metropolis
and overrides the __init__
and step()
methods.
You can keep track of the logp with a float or list attribute, for example.
But keep in mind that stepping all dimensions at the same time can be very inefficient, particularly with many dimensions and correlations.
1 Like
Thank you, that’s great. I am not doubting that it is less efficient. Do you maybe have a reference where I could read up on that?
Maybe as a general remark, the popularity of Bayesian methods in many areas of physics is increasing rapidly. However, the evaluation of models in many of those areas takes non-negligible time and can easily become the bottleneck for the analysis. So, wherever possible, caching previous results to avoid the reevaluation would be great. I don’t know if this is at all possible, just wanted to leave this as a remark.
Thanks for the help!