Based on the Instrumental Variable Modelling (IV) with pymc models — CausalPy 0.4.2 documentation article, it is obvious that there is a positive correlation between the treatment variable (X: risk) and the outcome variable (Y: GDP). However, looking at the correlation matrix and the last plot in green here, failing to understand why they suggest there is a negative correlation between the two. Can somebody please help me understand the interpretation here? What am I missing here?
This might be one for @Nathaniel_Forde
Sorry @ykarle generally if you have X, Y which are negatively correlated you then if you try to fit a regression Y \sim X you should derive a \beta_{1} = Cov(X, Y)/var(Y) which should be same sign.
In classical IV models you conduct a 2SLS estimate which means you estimate X_hat from Instrument I. X_{hat} \sim I and then Y \sim X_{hat}
In the Bayesian setting we’re also estimating two equations, one for X_{hat} and one for Y, and we’re modelling these outcomes with a bi-variate correlation structure.
The plot in green shows the estimated correlation betweeen outcomes X_{hat} and Y as modelled.
The plot in orange shows the regression fit Y \sim X with the X input at the raw scale and the beta coefficient adjusted purged of the endogeneity bias like you would achieve with 2SLS.
Hope that’s clear, but essentially it’a because we have 3 different variables in frame here X, X_{hat}, Y… so the signs don’t need to be the same.
Note: I should adjust the label on the plot to make that clearer! Good call out.
Thanks Nathaniel, that does make sense and makes it a lot clearer, thanks for the detailed explanation. I have marked this as the solution now!