I have a question about convergence. I searched for convergence issue in previous posts. But I have a general question about NUTS.
Suppose I have a neural network model with prior Normal(0,1) , Likelihood Normal(center at target, sd = 0.2 )
Does it make sense that convergence of model with 3 hidden nodes is good, but the convergence of model with more number of nodes not?
Does longer tuning improve the convergence or it has little to do with how long the tuning is?
It is quite likely. Neural networks are usually overparameterized, which resulting in a multimodal posterior. Such posterior is often a problem when you are trying to sample from it. Increasing the number of nodes (i.e., parameters) would likely make matter worst.
There is a related discussion on Stan discourse Why are Bayesian Neural Networks multi-modal? - General - The Stan Forums which contains some good insights.
Consequently, increasing tuning probably wouldn’t help.