# Simulating subgroup selection

I am pretty new to Bayesian modeling and PyMC and trying to get my head around all the concepts and the underlying math. So apologies in advance if the question is simple of if I have misunderstood some of the basic concepts.

I am starting with the Educational Outcomes for Hearing-impaired Children on the pymc homepage.

After running the code on the homepage I wanted to understand the results of the simulation. So I sample from the posterior predictive and then calculate the mean:

``````with test_score_model:
pm.sample_posterior_predictive(idata, extend_inferencedata=True, random_seed=42)
idata.posterior_predictive.mean() # = 87.92
``````

I get a distribution that is centered around a mean of 88 and could derive from this distribution how likely a score would be for a random child.

From the actual observations I can calculate the mean score as 84 and if I only look at children with family_inv > 1, the average score drops down to 71:

``````test_scores['score'].mean() # = 84
test_scores.loc[test_scores['family_inv']>1, "score"].mean() # = 71
``````

So if I translate this into real life, a random child would have the average score of 84 according to the observed data and if I learn that the family involvement is bad (i.e. high family_inv) I can predict based on the observed data that this will drop the score to 71.

My question is if/how I could run the same analysis based on the prediction. So how could I reflect constraints / insights from the real world into my prediction? How would my predicted distribution change if I only want to consider children with high family invention?

I would really appreciate is someone could point me in the right direction. Thanks a lot in advance!

Best regards
Oliver