In the bayesian paradigm, the observations are fixed, but taken to be a realization/sample from a random variable. When doing posterior predictive sampling on the same data the model was fitted on, each draw simulates that observation/sample, and as you have multiple observations and assume all coke from the same distribution, we can generate a histogram/kde/cdf fir every single draw. These are the red lines in your plot, the black line from the observations is a similar thing, but for your observations, so it can only be one line. The “posterior predictive mean” is the histogram of all posterior predictive samples (all observations and all draws). Kdes, histogrms… all have some randomness still in them, so it can be useful to have also rhis somewhat less noisy visualization.
When you take the mean, you are not comparing “equivalent” realizations. It would be saying that your observation process is not taking draws from the random variable in your model but taking s samples of n observations each and then take the mean over the samples to get the n observations. It is not exactly tye “distribution of the mean” instead of the “mean distribution” (what the dashed line in plot ppc attempts to be) but the “distribution of pointwise means”, something like that