Are my posterior predictive samples biased if I observe Y?

Samples taken from sample_posterior_predictive will depend on the data you put in, and will therefore contain information present in the data. Training an ML model on this data rather than the original data will not keep it from replicating patterns seen in the original data, because your PyMC3 model will learn to replicate patterns in the original data. I’m not sure why you’re proposing to train a model based on the outputs of another model rather than on the data directly.

(Jackknife-debiasing will probably not be useful here, I was confused by your use of the term “unbiased.”)