loo would be useful, to see if predictive fit declines. Also useful: checking that sampling does not diverge. And of course checking whether the posterior distributions are (approximately) the same as before.
One practical problem: sampling is compute-intensive, so it is not practical to sample many different times, with different observed values. Who wants to wait for hours before getting feedback on whether something is newly broken? Tossing the work onto AWS (or another cloud provider) could solve that problem, at some cost.
Currently, I run a test or two by hand and eyeball the results. That’s better than nothing, but sometimes I miss an issue. When I later catch it, I have to check through old versions, to see when the issue first exhibited.
Does anyone do better?