Robustness Against Weak or Invalid Instruments: Exploring Nonlinear Treatment Models with Machine Learning (2203.12808v4)
Abstract: We discuss causal inference for observational studies with possibly invalid instrumental variables. We propose a novel methodology called two-stage curvature identification (TSCI) by exploring the nonlinear treatment model with machine learning. {The first-stage machine learning enables improving the instrumental variable's strength and adjusting for different forms of violating the instrumental variable assumptions.} The success of TSCI requires the instrumental variable's effect on treatment to differ from its violation form. A novel bias correction step is implemented to remove bias resulting from the potentially high complexity of machine learning. Our proposed \texttt{TSCI} estimator is shown to be asymptotically unbiased and Gaussian even if the machine learning algorithm does not consistently estimate the treatment model. Furthermore, we design a data-dependent method to choose the best among several candidate violation forms. We apply TSCI to study the effect of education on earnings.
- Amemiya, T. (1974). The nonlinear two-stage least-squares estimator. J. Econom. 2(2), 105–110.
- Jackknife instrumental variables estimation. J. Appl. Econ. 14(1), 57–67.
- Using maimonides’ rule to estimate the effect of class size on scholastic achievement. Q. J. Econ. 114(2), 533–575.
- Angrist, J. D. and J.-S. Pischke (2009). Mostly harmless econometrics: An empiricist’s companion. Princeton university press.
- Recursive partitioning for heterogeneous causal effects. Proc. Natl. Acad. Sci. 113(27), 7353–7360.
- Generalized random forests. Ann. Stat. 47(2), 1148–1178.
- DoubleML – An object-oriented implementation of double machine learning in R. arXiv:2103.09603 [stat.ML].
- Jackknife instrumental variable estimation with heteroskedasticity. J. Econom. 185(2), 332–342.
- Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80(6), 2369–2429.
- Mendelian randomization with invalid instruments: effect estimation and bias detection through egger regression. Int. J. Epidemiol. 44(2), 512–525.
- Consistent estimation in mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40(4), 304–314.
- Card, D. (1993). Using geographic variation in college proximity to estimate the return to schooling. Natl. Bur. Econ. Res. Camb, Mass., USA.
- Mostly harmless machine learning: learning optimal instruments in linear iv models. J. Mach. Learn. Res., forthcoming.
- Double/debiased machine learning for treatment and structural parameters. J. Econom. 21(1).
- Emmenegger, C. (2021). dmlalg: Double machine learning algorithms. R-package available on CRAN.
- Regularizing double machine learning in partially linear endogenous models. Electron. J. Stat. 15(2), 6461–6543.
- Guo, Z. (2023). Causal inference with invalid instruments: post-selection problems and a solution using searching and sampling. J. R. Stat. Soc. Ser. B. 85(3), 959–985.
- Confidence intervals for causal effects with invalid instruments by using two-stage hard thresholding with voting. J. R. Statist. Soc. B 80(4), 793–815.
- Han, C. (2008). Detecting invalid instruments using l1-gmm. Econ. Lett. 101(3), 285–287.
- Estimation with many instrumental variables. J. Bus. Econ. Stat. 26(4), 398–422.
- Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica, 1029–1054.
- Instrumental variable estimation with heteroskedasticity and many instruments. Quant. Econom. 3(2), 211–255.
- Heckman, J. J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. In Annals of economic and social measurement, volume 5, number 4, pp. 475–492. NBER.
- Holland, P. W. (1988). Causal inference, path analysis and recursive structural equations models. ETS Res. Rep. Ser. 1988(1), i–50.
- Ivmodel: an R package for inference and sensitivity analysis of instrumental variables models with one endogenous variable. Obs Stud 7(2), 1–24.
- Instrumental variables estimation with some invalid instruments and its application to mendelian randomization. J. Am. Stat. Assoc. 111(513), 132–144.
- Kelejian, H. H. (1971). Two-stage least squares and econometric systems linear in parameters but nonlinear in the endogenous variables. J. Am. Stat. Assoc. 66(334), 373–374.
- Identification and inference with many invalid instruments. J. Bus. Econ. Stat. 33(4), 474–484.
- Lewbel, A. (2012). Using heteroscedasticity to identify and estimate mismeasured and endogenous regressor models. J. Bus. Econ. Stat. 30(1), 67–80.
- Lewbel, A. (2019). The identification zoo: Meanings of identification in econometrics. J. Econ. Lit. 57(4), 835–903.
- Random forests and adaptive nearest neighbors. J. Am. Stat. Assoc. 101(474), 578–590.
- On deep instrumental variables estimate. arXiv preprint arXiv:2004.14954.
- Meinshausen, N. (2006). Quantile regression forests. J. Mach. Learn. Res. 7(Jun), 983–999.
- P-values for high-dimensional regression. J. Am. Stat. Assoc. 104(488), 1671–1681.
- Newey, W. K. (1990). Efficient instrumental variables estimation of nonlinear models. Econometrica, 809–837.
- Puhani, P. (2000). The heckman correction for sample selection and its critique. J. Econ. Surv. 14(1), 53–68.
- Rothenberg, T. J. (1984). Approximating the distributions of econometric estimators and test statistics. Handb. Econom. 2, 881–935.
- Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688.
- Sargan, J. D. (1958). The estimation of economic relationships using instrumental variables. Econometrica, 393–415.
- Instrumental variable analysis of multiplicative models with potentially invalid instruments. Stat. Med. 35(29), 5430–5447.
- Small, D. S. (2007). Sensitivity analysis for instrumental variables regression with overidentifying restrictions. J. Am. Stat. Assoc. 102(479), 1049–1058.
- On the application of probability theory to agricultural experiments. essay on principles. section 9. Stat. Sci., 465–472.
- Staiger, D. and J. H. Stock (1997). Instrumental variables regression with weak instruments. Econometrica 65(3), 557–586.
- A survey of weak instruments and weak identification in generalized method of moments. J. Bus. Econ. Stat. 20(4), 518–529.
- Semiparametric efficient G-estimation with invalid instrumental variables. Biometrika 110(4), 953–971.
- The genius approach to robust mendelian randomization inference. Stat. Sci. 36(3), 443–464.
- Causal mediation analyses with rank preserving models. Biometrics 63(3), 926–934.
- Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113(523), 1228–1242.
- On the use of the lasso for instrumental variables estimation with some invalid instruments. J. Am. Stat. Assoc. 114(527), 1339–1350.
- The confidence interval method for selecting valid instrumental variables. J. R. Statist. Soc. B 83(4), 752–776.
- Woutersen, T. and J. A. Hausman (2019). Increasing the power of specification tests. J. Econom. 211(1), 166–175.