Instrumental variable analysis with binary outcome

I’ve been playing around with a model to estimate the causal effect of a binary intervention on a binary outcome. There is also an unobserved confound causing both the “exposure” and the “outcome”. The idea is to use instrumental variable analysis. However, the sampling is a disaster and the model is not even close to retrieving the parameter of interest.

The code is based on this blog post

import pymc as pm
import numpy as np
import pandas as pd
import arviz as az
import pytensor.tensor as pt
from scipy.special import expit as invlogit

rng = np.random.default_rng()

sample_size = 1000

I = rng.binomial(p=0.8, n=1, size=sample_size)
U = rng.normal(loc=0, scale=1, size=sample_size)
X = rng.binomial(p=invlogit(-1 - (3 * (1 - I)) + 0.6 * U), n=1, size=sample_size)
Y = rng.binomial(p=invlogit(-1 - 0.5 * X + 0.9 * U), n=1, size=sample_size)

dat_sim = pd.DataFrame({
    'X': X,
    'I': I,
    'Y': Y,
    'U': U
})

with pm.Model() as iv_model_bin:
     intercept_y = pm.Normal('intercept_y', mu=0, sigma=1)
     intercept_t = pm.Normal('intercept_t', mu=0, sigma=1)
     beta_t = pm.Normal('beta_t', mu=0, sigma=1)
     beta_z = pm.Normal('beta_z', 0, 1)
     sd_dist = pm.HalfCauchy.dist(beta=5, shape=2)
     chol, _, _ = pm.LKJCholeskyCov('chol_cov', eta=2, n=2, sd_dist=sd_dist)

     mu_y = pm.Deterministic('mu_y', beta_t * dat_sim.X + intercept_y)
     mu_t = pm.Deterministic('mu_t', beta_z * dat_sim.I + intercept_t)
     mu = pm.Deterministic('mu', pt.stack(tensors=([mu_y, mu_t]), axis=1))    

     mv = pm.MvNormal('mv', mu=mu, chol=chol, shape=(sample_size, 2))

     likelihood = pm.Bernoulli('likelihood', logit_p=mv, observed=pt.stack(([dat_sim.Y, dat_sim.X]), axis=1), shape=(sample_size,2))

     idata_iv_model_bin = pm.sample()

az.summary(idata_iv_model_bin, var_names=['intercept_y', 'intercept_t', 'beta_t', 'beta_z'])

My suspicions are:

  1. The model code/specification is causing the issues. I’m hoping that this is the issue. I’ve tried another version where I have two separate observed likelihoods, one for Y and one for I, but the results are roughly the same.
  2. Something about the data generating process is causing instrumental variable analysis to fail here. (I’ve not yet found anything in the literature suggesting that IV will cause biased estimates in this specific situation.)
  3. The priors are off. (I find it hard to wrap my mind around the priors here honestly.)
  4. There is something fundamentally wrong with me as a person. (Likely true.)

Why is my model a disaster? Any pointers, ideas or suggestions to help me figure this out will be greatly appreciated.

CC @Nathaniel_Forde and @drbenvincent, who have been working on IV models recently

Will try to look tonight or tomorrow. But maybe first thing to check is what a simple 2sls approach recovers from your DGP?

Thanks guys! The 2sls approach does not recover the DGP. Perhaps this suggests that the the effect can’t be estimated with IV regression. However, I would love to understand why…

Here is what I get with 2sls.

from linearmodels.iv import IV2SLS
formula = 'Y ~ 1 + [X ~ I]'
mod = IV2SLS.from_formula(formula, dat_sim)
iv_res = mod.fit()
print(iv_res)

Results:

                          IV-2SLS Estimation Summary                          
==============================================================================
Dep. Variable:                      Y   R-squared:                      0.0007
Estimator:                    IV-2SLS   Adj. R-squared:                -0.0003
No. Observations:                1000   F-statistic:                    0.0061
Date:                Tue, Jul 09 2024   P-value (F-stat)                0.9376
Time:                        23:06:46   Distribution:                  chi2(1)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept      0.2754     0.0363     7.5762     0.0000      0.2041      0.3466
X              0.0111     0.1420     0.0783     0.9376     -0.2672      0.2894
==============================================================================

Endogenous: X
Instruments: I
Robust Covariance (Heteroskedastic)
Debiased: False

And using causalpy:

import causalpy as cp
from causalpy.pymc_experiments import InstrumentalVariable
from causalpy.pymc_models import InstrumentalVariableRegression

sample_kwargs = {
    "tune": 1000,
    "draws": 2000,
    "chains": 4,
    "cores": 4,
    "target_accept": 0.99,
}
instruments_formula = "X  ~ 1 + I"
formula = "Y ~  1 + X"
instruments_data = dat_sim[["X", "I"]]
data = dat_sim[["Y", "X"]]
iv = InstrumentalVariable(
    instruments_data=instruments_data,
    data=data,
    instruments_formula=instruments_formula,
    formula=formula,
    model=InstrumentalVariableRegression(sample_kwargs=sample_kwargs),
)

az.summary(iv.model.idata, var_names=["beta_t", "beta_z"])[
    ["mean", "sd", "hdi_3%", "hdi_97%", "r_hat"]
]

Gives us:

	mean	sd	hdi_3%	hdi_97%	r_hat
beta_t[Intercept]	0.030	0.031	-0.029	0.085	1.0
beta_t[I]	0.252	0.034	0.189	0.316	1.0
beta_z[Intercept]	0.277	0.037	0.211	0.345	1.0
beta_z[X]	0.006	0.142	-0.273	0.259	1.0
2 Likes

Thanks for checking that, i suspect there is something to do with the GLM like DGP you have there… not entirely sure. It’s perhaps worth noting that packages in R like ivtools have a distinct class for estimating IVs with GLMS: https://cran.r-project.org/web/packages/ivtools/ivtools.pdf

You could try stripping back your DGP to just the linear statements or use a dataset that works with the linear framing and then overlay Logit transforms and try and see when it breaks…

I think some simulation will help an maybe worth trying other estimation strategies that don’t rely on the linear phrasing e.g. maybe try the GMM method. Or try g-computation approaches… so ignore the parameter estimates (as they mislead) but instead try imputing the outcome under counterfactual settings for the treatment variable an take the difference of the imputed values. Ivtools above uses g-est strategies which i think must be a g-computation non-parametric estimation routine… but i haven’t dug into the R code…

2 Likes

Interim update.

I think that the issue lies within the GLM portion of the modeling. If the X and Y are continuous (normally distributed) with a binary instrument, I can retrieve the DGP easily.

import pymc as pm
import numpy as np
import pandas as pd
import arviz as az
import pytensor.tensor as pt

rng = np.random.default_rng()

sample_size = 1000

I = rng.binomial(p=0.8, n=1, size=sample_size)
U = rng.normal(loc=0, scale=1, size=sample_size)
X_cont = rng.normal(loc=-1 - (3 * (1 - I)) + 0.6 * U, scale=1, size=sample_size)
Y_cont = rng.normal(loc=-1 - 0.5 * X_cont + 0.9 * U, scale=1, size=sample_size)

dat_sim = pd.DataFrame({
    'X_cont': X_cont,
    'I': I,
    'Y_cont': Y_cont,
    'U': U
})

with pm.Model() as iv_model_cont:
     intercept_y = pm.Normal('intercept_y', 0, .5)
     intercept_t = pm.Normal('intercept_t', 0, .5)
     beta_t = pm.Normal('beta_t', 0, 1)
     beta_z = pm.Normal('beta_z', 0, 1)
     sd_dist = pm.HalfCauchy.dist(beta=5)
     chol, _, _ = pm.LKJCholeskyCov('chol_cov', eta=2, n=2, sd_dist=sd_dist)

     mu_y = pm.Deterministic('mu_y', beta_t * dat_sim.X_cont + intercept_y)
     mu_t = pm.Deterministic('mu_t', beta_z * dat_sim.I + intercept_t)
     mu = pm.Deterministic('mu', pt.stack(tensors=(mu_y, mu_t), axis=1))    

     mv = pm.MvNormal('mv', mu=mu, chol=chol, observed=np.stack((dat_sim.Y_cont, dat_sim.X_cont), axis=1), shape=(sample_size, 2))

     idata_iv_model_cont = pm.sample()

az.summary(idata_iv_model_cont, var_names=['intercept_y', 'intercept_t', 'beta_t', 'beta_z'])


mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
intercept_y	-0.965	0.070	-1.107	-0.845	0.002	0.001	1875.0	2205.0	1.0
intercept_t	-3.961	0.084	-4.116	-3.804	0.002	0.001	1887.0	2449.0	1.0
beta_t	    -0.488	0.035	-0.553	-0.424	0.001	0.001	1679.0	2159.0	1.0
beta_z	     2.979	0.094	2.803	3.154	0.002	0.002	1744.0	2218.0	1.0

In the binary case, the ivtools package is able to retrieve the DGP, both using 2sls and the g-estimation approach, when the sample size is large (reliably with sample sizes of around 20,000). I’m a bit confused as to why the 2sls approach from linearmodels and my pymc based attempt at 2sls does not work:

with pm.Model() as iv_2sls:
     intercept_y = pm.Normal('intercept_y', 0, .5)
     intercept_t = pm.Normal('intercept_t', 0, .5)
     beta_t = pm.Normal('beta_t', 0, 1)
     beta_z = pm.Normal('beta_z', 0, 1)
     
     mu_t = pm.Deterministic('mu_t', beta_z * dat_sim['I'] + intercept_t)
     t = pm.Bernoulli('likelihood_t', logit_p=mu_t, observed=dat_sim['X'], shape=(sample_size,))

     mu_y = pm.Deterministic('mu_y', beta_t * t + intercept_y)
     y = pm.Bernoulli('likelihood_y', logit_p=mu_y, observed=dat_sim['Y'], shape=(sample_size,))

     idata_iv_2sls = pm.sample()

az.summary(idata_iv_2sls, var_names=['intercept_y', 'intercept_t', 'beta_t', 'beta_z'])

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
intercept_y	-0.955	0.025	-1.003	-0.907	0.001	0.000	2141.0	2389.0	1.0
intercept_t	-3.553	0.131	-3.788	-3.298	0.003	0.002	1500.0	1954.0	1.0
beta_t  	 0.007	0.053	-0.093	0.104	0.001	0.001	2438.0	2201.0	1.0
beta_z	     2.631	0.133	2.383	2.880	0.003	0.002	1564.0	1902.0	1.0

I guess I have to go dumpster diving in the code for the R package. In the meanwhile, if anyone knows what’s going on, please let me know. I would love to use pymc for this project rather than some frequentist R package.

1 Like

PSA: Do not do 2SLS mindlessly in the setting of binary outcomes.

I think the underlying issue is related to the “non-collapsibility” of odds-ratios. My understanding is that a non-linear approach or g-estimation would be the way to move forward. As of July 2024, I have not found a solid Bayesian approach to this problem. The package looks promising, but does not handle cases with binary outcomes yet. If anyone out there has this figured out, please let me know.