A general question about NUTS

square · June 25, 2018, 10:39am

Hi
I have a question about convergence. I searched for convergence issue in previous posts. But I have a general question about NUTS.

Suppose I have a neural network model with prior Normal(0,1) , Likelihood Normal(center at target, sd = 0.2 )
Does it make sense that convergence of model with 3 hidden nodes is good, but the convergence of model with more number of nodes not?

Does longer tuning improve the convergence or it has little to do with how long the tuning is?

junpenglao · June 25, 2018, 11:27am

It is quite likely. Neural networks are usually overparameterized, which resulting in a multimodal posterior. Such posterior is often a problem when you are trying to sample from it. Increasing the number of nodes (i.e., parameters) would likely make matter worst.
There is a related discussion on Stan discourse Why are Bayesian Neural Networks multi-modal? - General - The Stan Forums which contains some good insights.
Consequently, increasing tuning probably wouldn’t help.

Topic		Replies	Views
Chains converge to local optima? version agnostic gaussian_process , modeling	6	77	October 11, 2024
General approach to convergence issues version agnostic gaussian_process , modeling	2	689	June 16, 2023
NUTS sampler converges to wrong value Questions	7	795	September 9, 2020
Bayesian Backpropagation Development	1	1123	December 25, 2018
NUTS inefficient in learning standard deviation for normal likelihood Questions sampling	19	988	May 14, 2024

A general question about NUTS

Related topics