 # Comparing Posterior Samples - Conceptual Question

Hello

I have two different prediction models (say one for sales in Walmart and another for sales in JC Penney) , and from each model I have the 5000 samples from the posterior distribution of the percentage error in prediction. I can therefore compute mean percentage error, and the standard deviation of the percentage error. I want to compare whether one model is better at prediction that the other.

The procedure I have in mind is to randomly draw 5000 times a percentage error value from each posterior, compute the difference and then a) plot the differences and b) compute the HPD/HDI and check if 0 lies within the 95% HDI/HPD. If it does include 0, then the two models have similar percentage errors. If not the two are different.
Does that make sense?

Any references would be most welcome.

Hi,
if you have two models for the same dataset, you’d typically use a model comparison metric. There are some examples in Model comparison — PyMC3 3.11.4 documentation

If it’s actually the same model and you’re interested in the probability that a parameter (e.g. sales) for company A is higher than for company B, it depends:

• If you did one `pm.sample()`, fitting the PyMC model to all data at the same time, the posterior draws are correlated and you can simply do `p_B_greater_A = numpy.mean(posterior.sales_B > posterior.sales_A)`.
• If you did multiple `pm.sample()`, fitting the same model to two different datasets, you’ll not have any correlation between `sales_A` and `sales_B`. Now you want to get the `p_B_greater_A` given let’s say 10_000 posterior draws of `sales` from a model fitted to company A data and say 5_000 posterior draws of `sales` from a model fitted to company B data. Here you can’t do an elementwise comparison, but need to consider all `10_000 * 5_000` combinations. So “What’s the probability that a random posterior sample from B is greater than a random posterior sample from A?”. Doing this exactly for N>2 candidates to compare is surprisingly difficult to implement, but you can just use my `pyrff` package for that: `p_best = pyrff.sampling_probabilities([posteriorA.sales, posteriorB.sales], correlated=False)`. See the test case for some simple examples.

cheers

@michaelosthege Thanks so much for your response. Indeed the two samples are different (two prediction models, each estimated on a different data set.). I was going to go with bootstrap sampling from each set of posterior means - does that make sense? I will check out pyrff tonight.

``````pyrff.sampling_probabilities([