Meaning same dataset is less stable than mcmc or with larger dataset?
May need smaller learning rates?