After fitting the model the R2 for test data is negative

You’re absolutely right of course. I believe arviz.r2_score does the computation based on the paper you linked. My post was more of a general tut-tutting about going through all the effort to sample a posterior via MCMC, only to reduce it to a point estimate to compute a metric of interest.

In general, metrics like MAPE/RMSE/etc are going to have some non-linearity, so Jensen’s Inequality rears its ugly head. What someone likely wants is the posterior mean out-of-sample whatever, but what he’s computing the the out-of-sample whatever of the posterior mean, which is not the same thing.