Sometimes models work better with init=adapt_full sometimes adapt_diag is just better, and I don’t quite understand in which situations adapt_full is supposed to work better or what does it tell you when it does. I would also like to understand if you can somehow use some of the information of adapt_full, e.g. extract the mass matrix, and obtain from that some intuition on how to reparametrise your model to get adapt_diag to work better.

In the context of Quantum Field Theory, which is what the HMC was created for, I have always used the identity as a mass matrix. And this works. A dense mass matrix would anyway not be possible, even for a small space-time it would not even fit in the memory of any of our supercomputers. And this works despite our models always being very badly parametrised, in the sense that a large part of the game consists precisely in finding the actual degrees of freedom (protons and so on) emerging from the gluon&quark parametrisation, and that’s the whole reason why the HMC exists (otherwise a Taylor expansion would be enough).

Is this mass matrix tuning related to the No U-turn extra layer?

By the way, I have never understood if and how I could use pymc.HamiltonianMC to construct a good-old HMC with a fixed step size without all the fancy auto-tuning (even with a single-level integrator)

Concrety, my current problem is:

I have a model which I started to write with an increasing level of complexity. Basically the top layers of the graph are shared, and then the difference is whether at the end I have:

- A normal likelihood
- A multivariate normal likelihood
- A VAR with multivariate normal innovations

And I am stucked after step 1. This samples in 10 minutes with adapt_full, lasts for days without end with anything adapt_diag based, and somewhat samples in 10 minutes with JAX/Numpyro but not as good as PyMC/adapt_full. I could be fine with that, but the problem is that I cannot use adapt_full for the second model: because of my covariance matrix it seems to be too many variables and then the mass matrix can just not fit in memory. So somehow I need adapt_diag to work on model 1.

My data is approximately 200(components)x200(samples), with the covariance acting in the first dimension so a 200x200 LKJ or WB prior. My data also has plenty of orders of magnitude (mostly decaying exponentially in the first dimension), which might not be ideal in terms of stability.