Don't Use Metropolis

colcarroll · January 2, 2018, 2:48pm

Just wrote a new short post on why you should not use the Metropolis sampler. I compare a model sampled with the No U-Turn sampler to the same model sampled with Metropolis, and show how you might notice the sampler did a bad job.

https://colindcarroll.com/2018/01/01/bad-traces-or-dont-use-metropolis/

kyleabeauchamp · January 3, 2018, 11:44pm

Great post, a couple of comments below. So I’m not an expert, but IIRC I think everyone agrees that NUTS, HMC > Metropolis, at least in the case of continuous variables. To me, some important followup questions are: 1. what about discrete and/or mixed models? 2. How much better is NUTS vs HMC?

colcarroll · January 4, 2018, 12:42am

Yep, I think those are more interesting questions. I see a lot of questions in issues and on stack overflow where users manually assign Metropolis as a step method, and this was mostly to point at as evidence that you should not do that unless you have a great reason to do so. One of my favorite textbook series are the “Counterexamples in X”, and this was an attempt to make a concrete example of where Metropolis fails.

I think your two questions would be particularly interesting if you could find examples that were surprising:

PyMC3 automatically will use NUTS for anything continuous, and Metropolis for discrete, and I do not know any situation where using Metropolis for all the variables would do better.
You would only use HMC if you wanted to tune a few extra variables by hand. I would be interested to see a situation in which hand-tuned HMC does better enough to justify the hand tuning.

aseyboldt · January 4, 2018, 10:58am

We might also want to revisit the comparison between hmc and nuts when #2677 is merged. Right now hmc doesn’t have mass matrix adaptation and dual averaging for the step size. From some profiling it looks like in some cases there is noticeable overhead in all the bookkeeping necessary for nuts, and that wouldn’t be a problem for hmc.

kyleabeauchamp · January 5, 2018, 11:52pm

Your comment matches my own experience in molecular simulation, where the improved sampling of NUTS-like methods just never seemed to compensate for their greater expense. Of course, the log-likelihoods are quite different, as are the relative costs of likelihood and gradient calculations.

sergei · December 5, 2020, 1:23am

A bit unrelated, but I just tried 3.9 version and noticed that chains look very noisy in model with Metropolis(), exact same model was calculated with pymc3 version 3.8 and chains were much ‘smoother’.

ricardoV94 · June 21, 2022, 9:52am

To be fair, our implementation of Metropolis was unnecessarily bad for batched distributions such as the state_rate in that model. Wonder how Metropolis would fare after the change in https://github.com/pymc-devs/pymc/pull/5823

colcarroll · June 21, 2022, 1:59pm

Ok, you interested me, and I got the notebook running again, but you’ll have to trust me on the results. I ran this on colab, using pymc 4.0.1. Colab means that the sampling was done sequentially, and

Timing

This is the median, as reported by pymc:

Metropolis: 1:02 / chain
NUTS: 1:02 / chain

ESS

Cast to int for readability, and note that we use an updated ESS calculation, so it can be greater than 40k here.

Metropolis:

{'pooled_rate': array(1389),
 'state_rate': array([3586, 1681, 1761,  659, 1673, 2489, 3077, 4167,  890, 1029, 3397,
        2259, 1145, 1633, 2183, 1945, 1591, 1371, 2564, 1779, 2071, 1168,
        1571, 1514, 1194, 2385, 2533, 2251, 3503, 1544, 1843,  955, 1252,
        3506, 1097, 1731, 1945,  992, 3119, 1115, 2939, 1133,  792, 2446,
        4205, 1501, 1647, 1992, 1449, 3325])}

NUTS

{'pooled_rate': array(79228),
 'state_rate': array([104168, 101727, 105899, 105546,  97935, 106397,  97709,  98964,
         92776, 101145, 101708,  89374, 101525,  97536,  99377,  99805,
        100772,  97263,  90646,  94253, 103632, 104125,  99385,  99157,
         99894, 100388,  98847, 101918,  96794, 100890,  92974, 104452,
         94207,  96887, 104427, 104683,  89838, 104355,  98145, 104004,
        103054,  94856,  95218, 104233,  97423, 103515,  98367,  97032,
        102729, 105320])}

As a side note, Metropolis sampling emits this warning:

/usr/local/lib/python3.7/dist-packages/pymc/step_methods/metropolis.py:289: RuntimeWarning: overflow encountered in exp
  "accept": np.mean(np.exp(self.accept_rate_iter)),

ricardoV94 · June 21, 2022, 4:42pm

Totally convincing! Is the Metropolis doing better than before though?

colcarroll · June 21, 2022, 7:01pm

Yes – wall time is slower, but that’s comparing sequential Colab execution to a (I think!) 2016 macbook pro, and the ESS/S is also faster. Here are the old metropolis results:

{'pooled_rate': 3.0,
     'state_rate': array([ 13.,   2.,   6.,   5.,  29.,   8.,   7.,   6.,   2.,  45.,  48.,
             12.,   6.,  28.,   7.,   2.,   2.,   6.,  17.,   2.,   4.,  17.,
             10.,  11.,  12.,  13.,   2.,  12.,   2.,   7.,   5.,   4.,  35.,
             31.,   2.,  21.,   5.,   2.,  25.,   2.,   9.,   2.,  26.,  24.,
              2.,   2.,   2.,   8.,   2.,  10.,   2.])}

The old NUTS results were 4,000 for everything (but I would guess that ran into the old ESS implementation that didn’t allow more ESS than samples, so who knows).

Topic		Replies	Views
Reuse scale matrix from NUTS in HMC Questions	4	828	January 19, 2018
Step method runtime comparisons Development	0	144	February 19, 2024
Just using NUTS Questions	7	1109	May 5, 2018
Combining traces for different samplers Questions	1	657	April 30, 2020
Current version of HMC/NUTS in PyMC Development	6	160	March 27, 2025

Don't Use Metropolis

Timing

ESS

Related topics