Parameter dimensionality vs Model evaluations

In order to perform Bayesian inference of d-dimensional parameter vector using MCMC (Metropolis-Hastings or Slice or NUTS), how many model evaluations are needed to draw N samples from posterior?

It depends on the likelihood surface and the parametrization. If you want to compare on a specific model, override Distribution.logp so that it increments a global counter.

As @chartl said, it depends hugely on the model’s logp. From the complexity point of view, NUTS is far more complex than metropolis Hastings. For each sample drawn with nuts, at most you need, in the worst case, (d+1)*max_tree_depth evaluations (supposing that each component of the Jacobian of the model’s logp and the logp itself are computed independently from each other and have a similar complexity. Whereas metropolis Hastings does not use the Jacobian, does not perform the leapfrog integration step, and only evaluates the model’s logp once at each step of the Markov chain. So what you’ll see is that usually metropolis Hastings draws samples faster than NUTS.

However, there is a very crucial distinction that cannot be explained in algorithmic complexity terms. The samples drawn from nuts are far less correlated, explore the full space of probable parameters more efficiently and converge faster and better to the stable Markov chain than metropolis Hastings. The latter gets stuck easily in pathological regions of the parameter space, does a very slow exploration and consecutive samples in the chain are highly correlated with each other, requiring one to thin the resulting chain. In the end, the only practical way to compare both methods is to run them on the same model, see how many samples and thinning is needed by metropolis Hastings in order to get chains of a similar quality than nuts, and compare the time taken to run these. One last thing, metropolis Hastings’ uniformed proposals get worse performance with increasing dimensions whereas NUTS status mostly the same.

3 Likes