Are my posterior predictive samples biased if I observe Y?

Hi there!

As a newly converted Bayesian, one of my attempted learning strategies is transferring concepts from machine learning. Currently, I am working on a binary classification project and have been able to produce a decent working model in PyMC3, my likelihood looks something like this:

y = pm.Bernoulli(‘y’, p=pm.invlogit(alpha + p*beta), observed=y_data)

So I have my posterior, but I also need unbiased point estimates for Y. In traditional ML, I would run K-fold cross validation and use the out-of-fold predictions as my in-sample predictions. Such a process seems cumbersome and questionable when using PyMC3, especially given my model takes around 2 hours sample.

So I have two questions here:

First, given that I observe Y in the model, I am wondering if running sample_posterior_predictive is truly producing “unbiased” point estimates.

Second, I found a function in ArviZ called loo that I’ve been experimenting with and am wondering if it could be a solution if I set pointwise=True. From my understanding, it uses leave one out CV to produce an expected log pointwise predictive density. Is there some way I can transform this to an unbiased point estimate analogous to the out-of-fold predictions in ML?

Any guidance or resources are much appreciated, thanks in advance!

You should be able to use the y samples you get from sample_posterior_predictive directly. The main difference is that now you don’t get a point estimate but the whole distribution for them. You could take the mean/median to get a singe point distribution, what do you need the predictions for?

I also don’t understand what you mean by unbiased so I can’t help or give any opinion on that. Could you give some definition or reference about what you mean?

As per loo, you can use it to compare models or get weights in order to perform model averaging, but not to get predictions, the predictions come from the posterior predictive and loo is kind of using them to estimate the predictive accuracy of your model.

I’m trying to wrap my head around what you mean by “Unbiased” because that term has a specific meaning in statistics. If you’re asking whether LOO will tend to overestimate fit, it won’t. I suspect you’re somewhat confusing LOO with the jackknife, and looking for jackknife-debiased estimates. If that’s the case, taking the mean of the posterior predictive does not return an estimate for the population mean that is unbiased in the formal statistical sense. This is good, though. There’s a bias-variance tradeoff, and using the mean of the posterior predictive will automatically optimize it to give you the prediction with the minimum mean squared error.* Outside of intro stats classes where unbiased estimates are the norm, no conversation has ever gone like this –
“I have a problem – my estimates are all wildly wrong in different directions.”
“Ok, but do you know which direction they’re wrong in?”
“Then it’s variance and not bias, so it’s OK.”

If you really want unbiased estimates, you can get those by setting flat priors on everything, but this means you’ll be giving up the reduced variance from being able to set reasonable priors.

*Note that it minimizes the mean squared error assuming your priors are in some sense “Correct.” If you put in a really dumb prior, you’ll get really dumb point estimates. There’s no way to get a free lunch here – any prior that reduces the variance has to risk introducing bias. A flat prior minimizes the worst-case scenario, but will have a MSE higher than any prior that concentrates probability in an area that’s even remotely close to the real parameter value.

@OriolAbril Got it, thank you for clearing up the point on the loo function. My reasoning was that I could use the ELPD to get the predicted probability, and this would be analogous to K Fold. Predictions will be used for splitting the data based on the predicted class of 0 or 1 (>=0.5), observations belonging to class 1 will go to a second ML model in production.

Perhaps instead of using the term “unbiased” I should ask this question in terms of target variable leakage. I am dealing with a situation where I can’t yet observe Y for some data. What I want to know is if the samples from sample_posterior_predictive would represent target leakage in the second ML model. If I were to take the outputs from probabilistic models (say mean probability and standard deviation) and use them as features in the production ML model, would training set predictions contain leaked information about Y because we observed it in producing the probabilities.

@Lime, I will look into Jackknife-debiased estimates, but hopefully my clarification helps.

Samples taken from sample_posterior_predictive will depend on the data you put in, and will therefore contain information present in the data. Training an ML model on this data rather than the original data will not keep it from replicating patterns seen in the original data, because your PyMC3 model will learn to replicate patterns in the original data. I’m not sure why you’re proposing to train a model based on the outputs of another model rather than on the data directly.

(Jackknife-debiasing will probably not be useful here, I was confused by your use of the term “unbiased.”)