I’m having trouble understanding what’s going when pm.sample_ppc is called. I made some data generated by a simple linear regression and then drew samples from the posterior predictive distribution with this code:
I did not execute your code, but by looking at it there are some changes that you should make.
First you don’t have to define a probability distribution for x. I’m no expert, but it seems redundant.
In last line set the sample size to samples=10000. If you collect single sample, it will only draw a singles sample from the predictive distribution, which can be either closes to the mode of the predictive distribution or a point far away from that. If you collect sufficient number of samples then you can estimate the predictive distribution.
As I said, once you collect many samples for “y”, you have to consider the mode or mean of those samples for a single prediction.
Perhaps I should clarify my question a little - my goal with all of this is to generate simulated datasets of both the x and the y variable, and that’s why x has a prior distribution. You can see that the distribution of x from sample_ppc is fine, but the correlation with y isn’t carried through the model.
The conditional in sample_ppc is not set up properly in this case - as a result the new y_hat is not conditioned on the newly generated x. The reason here is that, in sample_ppc the RVs only updated according to the inputted point (a dictionary containing one posterior sample), but not the newly generated values.
For now, the only way to achive what you want is writting your own ppc function to do posterior generation. I think this is something we should improve, could you raise an issue on Github?