I did not follow the discussion, but here are some thoughts of your main post:
I.
I think you have some miss-conception about sampling. When you write down your model (i.e., put all the information under a with pm.Model() as ...
context), your parameter space is fixed. Inference is then extracting information from this space, for example, MAP that represents the (hopefully global) maximum of the space, MCMC samples represents the IID samples from the space that you can plug into another function and compute expectation.
In that regard, there is no distinction between MAP before or after samples. In another word, there is no prior or posterior MAP - as sampling does not change the find_MAP result.
Now combine this with theano.shared variables in a linear model. In this case if you set different values to the theano.shared variable your MAP result would change, because you are in a different parameter space now. However, this has little to do with posterior prediction. Fixing to a specific value to get posterior prediction is equivalent to sampling from a conditional probability distribution - you conditioned on the posterior and the new inputs
So to recap: internal state of the model does not change before or after sampling, but it does change after you fixed the input X, y value post sampling (i.e., post fixing).
II.
Depending on what is your aim, you can of course compute the average along the axis of n-samples, which gives you the expectation of the posterior prediction, or you can work with the raw sample directly. In the later case, it means for each new observation (X_i', y_i'), you have samples from y_i' \sim \pi(X_i' \mid posterior)
III.
There is a way to distinguish the residual from the parameter and linear prediction. You can get the posterior of \beta and dot multiply it with the new X_i', which would gives you the linear prediction with uncertainty. And plug that in along with sd into a Normal distribution and generate samples (essentially what sample_posterior_prediction
is doing) you get the posterior prediction with model residual. See also discussion here: Uncertainty of Model Predictions