Comparing Posterior Samples - Conceptual Question

Hello

I have two different prediction models (say one for sales in Walmart and another for sales in JC Penney) , and from each model I have the 5000 samples from the posterior distribution of the percentage error in prediction. I can therefore compute mean percentage error, and the standard deviation of the percentage error. I want to compare whether one model is better at prediction that the other.

The procedure I have in mind is to randomly draw 5000 times a percentage error value from each posterior, compute the difference and then a) plot the differences and b) compute the HPD/HDI and check if 0 lies within the 95% HDI/HPD. If it does include 0, then the two models have similar percentage errors. If not the two are different.
Does that make sense?

Any references would be most welcome.

Hi,
if you have two models for the same dataset, you’d typically use a model comparison metric. There are some examples in Model comparison — PyMC3 3.11.4 documentation

If it’s actually the same model and you’re interested in the probability that a parameter (e.g. sales) for company A is higher than for company B, it depends:

  • If you did one pm.sample(), fitting the PyMC model to all data at the same time, the posterior draws are correlated and you can simply do p_B_greater_A = numpy.mean(posterior.sales_B > posterior.sales_A).
  • If you did multiple pm.sample(), fitting the same model to two different datasets, you’ll not have any correlation between sales_A and sales_B. Now you want to get the p_B_greater_A given let’s say 10_000 posterior draws of sales from a model fitted to company A data and say 5_000 posterior draws of sales from a model fitted to company B data. Here you can’t do an elementwise comparison, but need to consider all 10_000 * 5_000 combinations. So “What’s the probability that a random posterior sample from B is greater than a random posterior sample from A?”. Doing this exactly for N>2 candidates to compare is surprisingly difficult to implement, but you can just use my pyrff package for that: p_best = pyrff.sampling_probabilities([posteriorA.sales, posteriorB.sales], correlated=False). See the test case for some simple examples.

cheers

@michaelosthege Thanks so much for your response. Indeed the two samples are different (two prediction models, each estimated on a different data set.). I was going to go with bootstrap sampling from each set of posterior means - does that make sense? I will check out pyrff tonight.

Sorry, I pasted a reply meant for another thread.

I was going to go with bootstrap sampling from each set of posterior means - does that make sense?

You should always work directly with the posterior samples.
Sounds like you’re doing the 2nd bullet point from the scenarios I mentioned.

This should do the trick:

pyrff.sampling_probabilities([
    samplesA, # 👈 NumPy array with shape (chains*draws,)
    samplesB, # also a 1D array; doesn't need to be the same shape though
], correlated=False)