Alex,
In the iris dataset example, you are using the mean of the predictions to illustrate the accuracy of the model (see the code snippet below).
y_pred = post_checks[“yl”].mean(0)
Since the predictions in post_checks["yl’] are the category numerical labels (either 0, 1, 2) and not probabilities, is it appropriate to use the average of the category numerical labels as a measure of the accuracy? In you approach, the category “0” will not have any contribution to the mean. Instead how about using the category with the highest frequency as below?
with model_sf:
pm.set_data({“X”: x_s})
post_checks2 = pm.sample_posterior_predictive(trace_sf, var_names=[“θ”], random_seed=RANDOM_SEED)[“θ”]
θ_mean2 = post_checks2.mean(0)
y_pred2 = np.argmax(θ_mean, axis=1)
jitter = np.random.normal(0, 0.03, len(y_s))
plt.figure(figsize=(12, 5))
plt.scatter(y_s + jitter, y_pred2, alpha=0.4)
plt.xticks(range(3), iris.species.unique())
plt.xlabel(“Observed category”)
plt.yticks(range(3), iris.species.unique())
plt.ylabel(“Predicted category”)
plt.title(“In-sample posterior predictive check”);
