Unfortunately @leka0024 never got a reply here, but I have more or less that same questions.
Maybe someone can help answering.
My best guesses would be:
- StudentT likelihoods have “nicer” gradients when the prediction is far away from the maximum. With x \to \inf, the logpdf_T(x) converges to a constant a line, wheras logpdf_{Normal}(x) keeps curving downwards. I could imagine this being a reason for divergences?
cheers