I am a big believer in cross-validation, so things like k-fold and leave-one-out cross-validation is my go-to approach. In that regards, pymc3.stats.loo might be the best metric, since it approximates the leave-one-out cross-validation score of your model. pymc3.stats.waic is also a good metric.
But ultimately, the model performance should be evaluated by whether it can predict future data.