Initial parameters in NUTS that cannot be changed

Hello,

So I want to know what parameters users CANNOT change when sampling via NUTS. I ran my exact model two times, one where the output was perfect and the other was not converged. It was the exact same code that just ran two times. This tells me that something with the initialization was the issue.

The perfect trace I saved, but I did not have the tuning steps saved.

That probably means your model is ill-defined/hard to sample and you didn’t see the problems in one of the runs by chance.

If you increase the number of chains you should see the problem happen every time.

I wouldn’t trust the “perfect” trace in a situation like this

Yes, I understand that my model is very hard to sample in.

The issue is that other HMC packages (written in Julia) can sample in it and also run every single time. But if PyMC cannot, I need to figure out and explain to a committee why this is the case. I don’t have to get to work. I just have to explain what the issues are and what things I cannot change in order for it to work.

So I wanted to know what thing change each initialization that could cause it to be good one time and not others.

By default, the initial point is jittered with uniform(-1, 1) noise, which can be a source of run-to-run variation. You can turn this off with the init argument (ask for adapt_diag or adapt_diag_grad instead of jitter+adapt_diag) if you are using the PyMC NUTS sampler.

If you’re in numpyro, blackjax, or nutpie, you have to dig into their documentation. Keyword arguments to the sample functions of these backends can be forwarded via the nuts_sampler_kwargs argument.

Awesome! Y’all are the best thank you for the help!

By chance, is it still possible to pass in a custom dense mass matrix?

I see this post but I’m not sure if this works in the latest version.

Post:

The issue is that other HMC packages (written in Julia) can sample in it and also run every single time.

This is a reason for concern. Perhaps you haven’t converted the model quite correctly? Perhaps the model is very sensitive to initial points? Also are you sure those samplers are doing a good job at covering the true posterior? Perhaps they look like they are doing a good job but aren’t really.

If it were me I would do the following:

  1. double check the model was correctly translated
  2. run much more chains in the other languages to see if you also start seeing problems over there. Maybe you were just lucky (which in this case is a form of unluck)

I just have to explain what the issues are and what things I cannot change in order for it to work.

Besides making the model more sound, you can try to:

  1. increase target_accept
  2. increase tuning length
  3. provide custom start point
  4. change nuts init strategy
1 Like

To go back to your original question. In general the advice is if your model diverges at all, even if only in some of the chains, you can’t trust the results, because it usually means it isn’t able to explore the posterior properly (even in the chains where it didn’t diverge!)

1 Like