Simulating subgroup selection

olf359 · September 14, 2023, 7:45am

I am pretty new to Bayesian modeling and PyMC and trying to get my head around all the concepts and the underlying math. So apologies in advance if the question is simple of if I have misunderstood some of the basic concepts.

I am starting with the Educational Outcomes for Hearing-impaired Children on the pymc homepage.

After running the code on the homepage I wanted to understand the results of the simulation. So I sample from the posterior predictive and then calculate the mean:

with test_score_model: 
    pm.sample_posterior_predictive(idata, extend_inferencedata=True, random_seed=42)
idata.posterior_predictive.mean() # = 87.92

I get a distribution that is centered around a mean of 88 and could derive from this distribution how likely a score would be for a random child.

From the actual observations I can calculate the mean score as 84 and if I only look at children with family_inv > 1, the average score drops down to 71:

test_scores['score'].mean() # = 84
test_scores.loc[test_scores['family_inv']>1, "score"].mean() # = 71

So if I translate this into real life, a random child would have the average score of 84 according to the observed data and if I learn that the family involvement is bad (i.e. high family_inv) I can predict based on the observed data that this will drop the score to 71.

My question is if/how I could run the same analysis based on the prediction. So how could I reflect constraints / insights from the real world into my prediction? How would my predicted distribution change if I only want to consider children with high family invention?

I would really appreciate is someone could point me in the right direction. Thanks a lot in advance!

Best regards
Oliver

Topic		Replies	Views
Interpretation of posterior predictive distribution modeling	4	1319	July 19, 2023
Theoretical and Practical Considerations and Questions v5 development , modeling , sampling	0	26	September 13, 2024
[Beginner level question on modeling] Bayesian analysis of F1 scores from two ML models v5 modeling	5	414	January 24, 2023
How to get posterior predictive distribution sample data for a single prediction? v5	1	127	November 5, 2024
How to use the posterior predictive distribution for checking a model from PyMC version agnostic arviz , model-checking	10	4181	March 14, 2023

Simulating subgroup selection

Related topics