How to interpret posterior/prior predictive checks

J_V · September 22, 2023, 2:47pm

Hello PyMC community!

I’ve been diligently studying the documentation and following various books that leverage PyMC in their examples. Now, as I’ve ventured into creating my first model, I find myself facing some uncertainties when it comes to assessing the quality of my model.

I’m currently engaged in a regression analysis task where the relationship is described as y = Ax + B. x and y are observed data, originate from real-world lab tests.

To kickstart my analysis, I initially divided my dataset into sets of four paired data points. For each set, I fitted a linear regression and then derived the A and B parameters. This process resulted in distributions for both A and B, with A following a normal distribution and B following a log-normal distribution.

With these distributions in hand, I turned to Scipy to estimate the parameters for the normal and log-normal distributions associated with the A and B regression parameters. Subsequently, I generated synthetic data, introducing random noise based on these estimated parameters.

These synthetic data were then split into two categories: prior knowledge and new data. With this setup, I proceeded to construct my PyMC model:

with pm.Model() as linear_model_s_t:
    # 1 -Definir conhecimento a Priori:
    
    #Intercepto
    intercepto = pm.LogNormal('Intercepto', mu=mu_B, sigma=std_B)
    
    #Declive
    declive = pm.LogNormal('Declive', mu=mu_A, sigma=std_A)
    
    #Desvio Padrão: Tenho dúvidas
    sigma = pm.HalfNormal('sigma', sigma=10)
    
    # 2 - estimar a média, que será o meu Y
    # Y = Ax + B -> 
    
    mu = declive * x_new + intercepto
    
    # 3 - Definir Likelihood 
    
    y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed = y_new)
    
    # 4 -  Modelo for sampling
    
    trace = pm.sample(2000, tune=2000,chains=4, cores=2, random_seed=rng)
    
    # 5 - Gerar amostras da priori
    
    prior_pred = pm.sample_prior_predictive(samples=1000, random_seed=rng) 
    post_pred = pm.sample_posterior_predictive(trace,random_seed=rng)

However, it appears that my model might be overfitting the data, as indicated by the first row in the plot below. I’m seeking guidance on how I can enhance my model to achieve a better fit with my synthetic data.

I’m not sure if I’m interpreting these graphs correctly, so any advice or insights you can offer would be greatly appreciated!

Thank you!

drbenvincent · September 23, 2023, 11:22am

Hi @J_V. Can you explain the plots a bit more and why you think they indicate overfitting?

J_V · September 25, 2023, 12:19pm

Hello,
I guess that in ax[0,0], the distribution is a little bit to the left, and in ax[0,1], the left part is also not in the gray zone. Is that reasoning correct?

Topic		Replies	Views
Bayesian Regression: Inconsistent Prior Predictive Checks v5 linear_model , prior , modeling , model-checking	12	944	October 9, 2023
Some misunderstandings with Prior predictive checks Questions	7	1373	March 21, 2023
How to use the posterior predictive distribution for checking a model from PyMC version agnostic arviz , model-checking	10	4248	March 14, 2023
How do I "Score" My Model Questions	2	1896	January 26, 2021
Unsure how to proceed with prior and posterior predictive checking for Bayesian multiple logistic regression v5 arviz , model-checking	6	1149	July 6, 2023

How to interpret posterior/prior predictive checks

Related topics