After fitting the model the R2 for test data is negative

That function may be useful. There is no singular answer to the question “What is the correct metric to evaluate my model”? I might suggest checking out the notebooks on conducting posterior predictive checks.