How does pyMC handle the curse of dimensionality?

I’m working with a hierarchical model in PyMC, which involves a total of 30 parameters. As a result, the posterior distribution would become high-dimensional. I have not implemented this yet and am wondering how the NUTS sampler of pyMC would handle this “curse of dimensionality”. Can NUTS effectively sample such a multidimensional posterior?

Additionally, I am also curious to know if some other samplers like Replica Exchange MC or Langevin MC would do better than NUTS in this regard. I am not sure if these samplers are available in pyMC and hence this question. I’d really appreciate any insights, experiences, or implementation examples that anyone can point me to. Thanks a lot :slight_smile:

NUTS can sample way more than 30 parameters. There are experts here that can give you a better intuition, but from what I gather is much related to the ability to use gradient information, which allows the sampler to easily move towards areas of interest.

1 Like

Well, it depends on your priors … :grin:

In the end, these probabilistic models have the same power as generic neuronal networks, also you use them differently. Instead of searching for a solution With a lot of data and large generic unbiased (low biases sometimes) building blocks like in NN, you create smaller specialized structures with higher bias (tight priors).

If and how your model copes with high dimensionality heavily depends on your model and prior choices + the amount of data you have.

The sampling algos have no clear winner in coping with high dimensional problems either. It’s trade offs and matches for more specific modelling problems (like the ability to have an easier time to explore multimodal posteriors with remc, which may correlate with higher dimensional problems)

1 Like

Plain HMC (aka pymc.step_methods.hmc.HamiltonianMC — PyMC v5.10.3 documentation) is routinely used to sample billions of variables.
Here is for instance a work with 3.6 billion variables (64^3x128x12x3^2): [1311.3844] Determination of the $A_2$ amplitude of $K \rightarrow ππ$ decays
And here is one with 12 billion variables (192^4x3^2) using SMD (which is not quite plain HMC but almost) : [2111.11544] Master-field simulations of QCD
The thing is that these are simple and deeply understood models, which have physical reasons to behave correctly.
It all depends on the model.

In this context, I have never heard Langevin equations being defended as giving a better volume scaling. For physicists it’s mostly used in cases where the HMC formalism simply doesn’t apply, in particular when your logp is not a real number.

On the other hand, an example of method which exists to deal with the curse of dimensionality (and which is not implemented in PyMC) is that if some variables are somewhat separated you can perform multilevel Monte-Carlo integration. For instance INSPIRE
This still happens in the context of the HMC, and could perfectly happen in NUTS.

Multi-modality is mostly an independent problem, and sometimes you can alleviate the problem using only a small re-parametrisation of the boundary, while keeping the number of parameters fixed: [1105.4749] Lattice QCD without topology barriers

1 Like