Comparing Posterior Samples - Conceptual Question

pythonometrist · October 14, 2021, 4:37pm

Hello

I have two different prediction models (say one for sales in Walmart and another for sales in JC Penney) , and from each model I have the 5000 samples from the posterior distribution of the percentage error in prediction. I can therefore compute mean percentage error, and the standard deviation of the percentage error. I want to compare whether one model is better at prediction that the other.

The procedure I have in mind is to randomly draw 5000 times a percentage error value from each posterior, compute the difference and then a) plot the differences and b) compute the HPD/HDI and check if 0 lies within the 95% HDI/HPD. If it does include 0, then the two models have similar percentage errors. If not the two are different.
Does that make sense?

Any references would be most welcome.

michaelosthege · October 15, 2021, 6:02pm

Hi,
if you have two models for the same dataset, you’d typically use a model comparison metric. There are some examples in Model comparison — PyMC3 3.11.4 documentation

If it’s actually the same model and you’re interested in the probability that a parameter (e.g. sales) for company A is higher than for company B, it depends:

If you did one pm.sample(), fitting the PyMC model to all data at the same time, the posterior draws are correlated and you can simply do p_B_greater_A = numpy.mean(posterior.sales_B > posterior.sales_A).
If you did multiple pm.sample(), fitting the same model to two different datasets, you’ll not have any correlation between sales_A and sales_B. Now you want to get the p_B_greater_A given let’s say 10_000 posterior draws of sales from a model fitted to company A data and say 5_000 posterior draws of sales from a model fitted to company B data. Here you can’t do an elementwise comparison, but need to consider all 10_000 * 5_000 combinations. So “What’s the probability that a random posterior sample from B is greater than a random posterior sample from A?”. Doing this exactly for N>2 candidates to compare is surprisingly difficult to implement, but you can just use my pyrff package for that: p_best = pyrff.sampling_probabilities([posteriorA.sales, posteriorB.sales], correlated=False). See the test case for some simple examples.

cheers

pythonometrist · October 17, 2021, 6:03am

@michaelosthege Thanks so much for your response. Indeed the two samples are different (two prediction models, each estimated on a different data set.). I was going to go with bootstrap sampling from each set of posterior means - does that make sense? I will check out pyrff tonight.

michaelosthege · November 5, 2021, 3:49pm

Sorry, I pasted a reply meant for another thread.

I was going to go with bootstrap sampling from each set of posterior means - does that make sense?

You should always work directly with the posterior samples.
Sounds like you’re doing the 2nd bullet point from the scenarios I mentioned.

This should do the trick:

pyrff.sampling_probabilities([
    samplesA, # 👈 NumPy array with shape (chains*draws,)
    samplesB, # also a 1D array; doesn't need to be the same shape though
], correlated=False)

Topic		Replies	Views
Beginner question - Comparing two posterior predictive distributions with different number of observed data v5	8	697	July 12, 2023
Pm.sample_posterior_predictive worse than results from forward models v3 modeling	0	436	April 15, 2023
Bayesian T-Test to compare posterior distribution from two separate models? Questions	4	1821	December 10, 2020
Comparing posterior predictive errors across samples Questions	0	391	October 7, 2020
Using traceplot to compare different posteriors? Questions	3	1001	December 19, 2017

Comparing Posterior Samples - Conceptual Question

Related topics