Hi,
in some other discussion, I read something that the ravelling over chains and draws of posteriors should be avoided. I didn’t fully understand the point and maybe I misunderstood this. Therefore, I would like to clarify this for me with the simple example of the comparison of two estimated normal means.
I estimated these means and can access the posterior results in idata["posterior"]["mu_A"]
and idata["posterior"]["mu_B"]
.
To compare the two means, I could now do this:
posterior_mu_A = idata_ab_test["posterior"]["mu_A"].values.ravel()
posterior_mu_B = idata_ab_test["posterior"]["mu_B"].values.ravel()
When I plot these data it could look like this:
I could then continue and ask myself “What is the probability that the difference between the two means is greater then 0.5?” and write the following code to get this probability:
epsilon = 0.5
diff = posterior_mu_A - posterior_mu_B
mean_diff = np.mean(diff)
prob_diff_greater_epsilon = np.mean(diff > epsilon)
This could be visualized like this:
My question now is if it was “correct” to ravel() the posterior data in the first place? I sort of handle the ravelled data like independent draws of the same distribution. Assumed that the metrics of all chains look ok, is this assumption correct or is there any argument that this should be avoided?
Thanks for any hints (and have a nice christmas time!)
Matthias