Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robustness Against Weak or Invalid Instruments: Exploring Nonlinear Treatment Models with Machine Learning (2203.12808v4)

Published 24 Mar 2022 in stat.ME, math.ST, stat.ML, and stat.TH

Abstract: We discuss causal inference for observational studies with possibly invalid instrumental variables. We propose a novel methodology called two-stage curvature identification (TSCI) by exploring the nonlinear treatment model with machine learning. {The first-stage machine learning enables improving the instrumental variable's strength and adjusting for different forms of violating the instrumental variable assumptions.} The success of TSCI requires the instrumental variable's effect on treatment to differ from its violation form. A novel bias correction step is implemented to remove bias resulting from the potentially high complexity of machine learning. Our proposed \texttt{TSCI} estimator is shown to be asymptotically unbiased and Gaussian even if the machine learning algorithm does not consistently estimate the treatment model. Furthermore, we design a data-dependent method to choose the best among several candidate violation forms. We apply TSCI to study the effect of education on earnings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Amemiya, T. (1974). The nonlinear two-stage least-squares estimator. J. Econom. 2(2), 105–110.
  2. Jackknife instrumental variables estimation. J. Appl. Econ. 14(1), 57–67.
  3. Using maimonides’ rule to estimate the effect of class size on scholastic achievement. Q. J. Econ. 114(2), 533–575.
  4. Angrist, J. D. and J.-S. Pischke (2009). Mostly harmless econometrics: An empiricist’s companion. Princeton university press.
  5. Recursive partitioning for heterogeneous causal effects. Proc. Natl. Acad. Sci. 113(27), 7353–7360.
  6. Generalized random forests. Ann. Stat. 47(2), 1148–1178.
  7. DoubleML – An object-oriented implementation of double machine learning in R. arXiv:2103.09603 [stat.ML].
  8. Jackknife instrumental variable estimation with heteroskedasticity. J. Econom. 185(2), 332–342.
  9. Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80(6), 2369–2429.
  10. Mendelian randomization with invalid instruments: effect estimation and bias detection through egger regression. Int. J. Epidemiol. 44(2), 512–525.
  11. Consistent estimation in mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40(4), 304–314.
  12. Card, D. (1993). Using geographic variation in college proximity to estimate the return to schooling. Natl. Bur. Econ. Res. Camb, Mass., USA.
  13. Mostly harmless machine learning: learning optimal instruments in linear iv models. J. Mach. Learn. Res., forthcoming.
  14. Double/debiased machine learning for treatment and structural parameters. J. Econom. 21(1).
  15. Emmenegger, C. (2021). dmlalg: Double machine learning algorithms. R-package available on CRAN.
  16. Regularizing double machine learning in partially linear endogenous models. Electron. J. Stat. 15(2), 6461–6543.
  17. Guo, Z. (2023). Causal inference with invalid instruments: post-selection problems and a solution using searching and sampling. J. R. Stat. Soc. Ser. B. 85(3), 959–985.
  18. Confidence intervals for causal effects with invalid instruments by using two-stage hard thresholding with voting. J. R. Statist. Soc. B 80(4), 793–815.
  19. Han, C. (2008). Detecting invalid instruments using l1-gmm. Econ. Lett. 101(3), 285–287.
  20. Estimation with many instrumental variables. J. Bus. Econ. Stat. 26(4), 398–422.
  21. Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica, 1029–1054.
  22. Instrumental variable estimation with heteroskedasticity and many instruments. Quant. Econom. 3(2), 211–255.
  23. Heckman, J. J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. In Annals of economic and social measurement, volume 5, number 4, pp.  475–492. NBER.
  24. Holland, P. W. (1988). Causal inference, path analysis and recursive structural equations models. ETS Res. Rep. Ser. 1988(1), i–50.
  25. Ivmodel: an R package for inference and sensitivity analysis of instrumental variables models with one endogenous variable. Obs Stud 7(2), 1–24.
  26. Instrumental variables estimation with some invalid instruments and its application to mendelian randomization. J. Am. Stat. Assoc. 111(513), 132–144.
  27. Kelejian, H. H. (1971). Two-stage least squares and econometric systems linear in parameters but nonlinear in the endogenous variables. J. Am. Stat. Assoc. 66(334), 373–374.
  28. Identification and inference with many invalid instruments. J. Bus. Econ. Stat. 33(4), 474–484.
  29. Lewbel, A. (2012). Using heteroscedasticity to identify and estimate mismeasured and endogenous regressor models. J. Bus. Econ. Stat. 30(1), 67–80.
  30. Lewbel, A. (2019). The identification zoo: Meanings of identification in econometrics. J. Econ. Lit. 57(4), 835–903.
  31. Random forests and adaptive nearest neighbors. J. Am. Stat. Assoc. 101(474), 578–590.
  32. On deep instrumental variables estimate. arXiv preprint arXiv:2004.14954.
  33. Meinshausen, N. (2006). Quantile regression forests. J. Mach. Learn. Res. 7(Jun), 983–999.
  34. P-values for high-dimensional regression. J. Am. Stat. Assoc. 104(488), 1671–1681.
  35. Newey, W. K. (1990). Efficient instrumental variables estimation of nonlinear models. Econometrica, 809–837.
  36. Puhani, P. (2000). The heckman correction for sample selection and its critique. J. Econ. Surv. 14(1), 53–68.
  37. Rothenberg, T. J. (1984). Approximating the distributions of econometric estimators and test statistics. Handb. Econom. 2, 881–935.
  38. Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688.
  39. Sargan, J. D. (1958). The estimation of economic relationships using instrumental variables. Econometrica, 393–415.
  40. Instrumental variable analysis of multiplicative models with potentially invalid instruments. Stat. Med. 35(29), 5430–5447.
  41. Small, D. S. (2007). Sensitivity analysis for instrumental variables regression with overidentifying restrictions. J. Am. Stat. Assoc. 102(479), 1049–1058.
  42. On the application of probability theory to agricultural experiments. essay on principles. section 9. Stat. Sci., 465–472.
  43. Staiger, D. and J. H. Stock (1997). Instrumental variables regression with weak instruments. Econometrica 65(3), 557–586.
  44. A survey of weak instruments and weak identification in generalized method of moments. J. Bus. Econ. Stat. 20(4), 518–529.
  45. Semiparametric efficient G-estimation with invalid instrumental variables. Biometrika 110(4), 953–971.
  46. The genius approach to robust mendelian randomization inference. Stat. Sci. 36(3), 443–464.
  47. Causal mediation analyses with rank preserving models. Biometrics 63(3), 926–934.
  48. Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113(523), 1228–1242.
  49. On the use of the lasso for instrumental variables estimation with some invalid instruments. J. Am. Stat. Assoc. 114(527), 1339–1350.
  50. The confidence interval method for selecting valid instrumental variables. J. R. Statist. Soc. B 83(4), 752–776.
  51. Woutersen, T. and J. A. Hausman (2019). Increasing the power of specification tests. J. Econom. 211(1), 166–175.
Citations (2)

Summary

We haven't generated a summary for this paper yet.