How NUTS evades the curse of dimensionality

madanh · August 16, 2017, 10:15am

When I run my model in NUTS with 20 or 2000 data points there’s no difference in sampling speed, unlike, e.g. emcee. I’ve read somewhere that NUTS is not affected by the curse of dimensionality. Is there a good video or and article explaining why is it so?

twiecki · August 16, 2017, 10:35am

This talk by Michael Betancourt was really good in explaining the problem: https://www.youtube.com/watch?v=pHsuIaPbNbY. This blog post by Andrew Gelman is also very good http://andrewgelman.com/2017/03/15/ensemble-methods-doomed-fail-high-dimensions/.

aseyboldt · August 16, 2017, 12:30pm

In addition to that stuff @twiecki mentioned:
The number of parameters or data points usually isn’t a good measure of how hard the problem is. The best measure of performance we have is effective samples per second. That is influenced by those factors:

The number of effective samples per sample
The number of gradient evaluations per sample
The time to compute the gradient (plus some overhead, especially if the gradient is very fast)

If you want to have a look at those for a particular trace, you can get the effective sample size using pm.effective_n, the number of grad evals as trace.tree_size, and the time it takes to evaluate the gradient using

logp_dlogp = model.logp_dlogp_function()
logp_dlogp.set_extra_vars({})
x = np.random.randn(logp_dlogp.size)
%timeit logp_dlogp(x)

Now, let’s say you increase the number of data points, without changing the model or the number of parameters (in many models the number of parameters depends on the number of data points, but let’s ignore that here). Then in many cases the posterior will get tighter and start approximating a normal distribution more. Unless this introduces correlations, the number of effective samples per sample might increase because of this, and the number of gradient evaluations per sample might decrease, since it doesn’t need to follow the long tails of a distribution in long trajectories, where the step sizes aren’t well adapted. The time to compute the gradient will increase however (although maybe not as much as you would expect, as especially for small datasets a large part of the gradients is just overhead).

In short: the sampling speed doesn’t so much depend on the number of data points or parameters, but more on the shape of the posterior.

Topic		Replies	Views
How does pyMC handle the curse of dimensionality? version agnostic modeling	3	364	February 4, 2024
PyMC3 slows rapidly with increasing numbers of parameters Questions	5	11093	December 3, 2018
Parameter dimensionality vs Model evaluations Questions	2	526	March 17, 2019
About the speed of sampling Questions	1	1348	March 13, 2018
NUTS Sampler: Effective samples is smaller than 200 for some parameters Questions	4	4792	July 6, 2020

How NUTS evades the curse of dimensionality

Related topics