# Comparing posterior predictive errors across samples

Hello!
Like many of the questions here, mine is half conceptual and half technical. For my research, I have 4 datasets from 4 countries estimated a Bayesian Beta Regression Model. I am not interested in estimating the ‘best’ model, rather I want to compare the model performance (with a consistent set of variables), between the different countries (hence no Hierarchical regression model). Comparing AIC/WAIC etc. is not possible across datasets, thus I have taken the median posterior estimation, subtracted the observed value to get the mean absolute error of each model.

My 2 questions are:

1. Is using ‘pm.sample_posterior_predictive’ to calculate the MAE and then compare across datasets (with diff N) appropriate?

2. Is the MAE of how well a model predicts its own data be indicative of how well the variables to the model were selected?

This is my first post so if you would like/need more information please do not hesitate to reach out!
Thank you!

Below is the code I use for one of the four countries (The United States)

Here I get the predicted values:

with h1_model: #the previously estimated Bayesian Beta Regression Model
ppc1 = pm.sample_posterior_predictive(
trace_USA, var_names=[‘y’], random_seed=RSEED)

Because there is 1 predicted ‘y’ value per trace I take the median

USA_pred_list = []
for i in range(len(USA)):
USA_pred_list.append(np.median(ppc1[‘y’][:,i]))

And here, in subtracting the median prediction from the observed ‘y’ value and taking the mean, I calculate the mean absolute error.

USA_act_list = list(USA[‘soc_preport_adj’]) #the observed y variable
USA_list = []
for i in range(len(USA)):
temp = USA_pred_list[i] - USA_act_list[i]
USA_list.append(temp)
USA_list = [abs(ele) for ele in USA_list]
np.mean(USA_list)