Thanks! I will have a look at that parametrisation for beta.
Some of the parameters I’m interested in don’t have “obvious” distributions like the decay example above and can be more spread out across their entire possible range, similarly to the learning rate which can take any value in [0,1].
Is there anything else that could be done to avoid the estimates of some parameters to converge around the same value when using non-centered parametrisation?
I’m trying to look for examples of similar issues but I can’t find much.
Any ideas/pointers would be greatly appreciated ![]()