Learning Decision Policies with Instrumental Variables through Double Machine Learning (2405.08498v3)
Abstract: A common issue in learning decision-making policies in data-rich settings is spurious correlations in the offline dataset, which can be caused by hidden confounders. Instrumental variable (IV) regression, which utilises a key unconfounded variable known as the instrument, is a standard technique for learning causal relationships between confounded action, outcome, and context variables. Most recent IV regression algorithms use a two-stage approach, where a deep neural network (DNN) estimator learnt in the first stage is directly plugged into the second stage, in which another DNN is used to estimate the causal effect. Naively plugging the estimator can cause heavy bias in the second stage, especially when regularisation bias is present in the first stage estimator. We propose DML-IV, a non-linear IV regression method that reduces the bias in two-stage IV regressions and effectively learns high-performing policies. We derive a novel learning objective to reduce bias and design the DML-IV algorithm following the double/debiased machine learning (DML) framework. The learnt DML-IV estimator has strong convergence rate and $O(N{-1/2})$ suboptimality guarantees that match those when the dataset is unconfounded. DML-IV outperforms state-of-the-art IV regression methods on IV regression benchmarks and learns high-performing policies in the presence of instruments.
- Weak instruments in instrumental variables regression: Theory and practice. Annual Review of Economics, 11:727–753, 8 2019. ISSN 19411391. doi: 10.1146/ANNUREV-ECONOMICS-080218-025643/1.
- Doubly robust structure identification from temporal data. arXiv preprint arXiv:2311.06012, 2023.
- J. D. Angrist. Lifetime earnings and the vietnam era draft lottery: Evidence from social security administrative records. The American Economic Review, 80:1284–1286, 1990. ISSN 00028282.
- Mostly Harmless Econometrics. Princeton University Press, 2 2009. doi: 10.2307/J.CTVCM4J72.
- Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91:444–455, 6 1996. ISSN 1537274X. doi: 10.1080/01621459.1996.10476902.
- H. Bang and J. M. Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962–973, 2005.
- E. Bareinboim and J. Pearl. Causal inference by surrogate experiments: z-identifiability. Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, 2012.
- Local rademacher complexities. The Annals of Statistics, 33:1497-1537, 2005.
- Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica, 80(6):2369–2429, 2012.
- Doubly robust nonparametric inference on the average treatment effect. Biometrika, 104(4):863–880, 2017.
- Deep generalized method of moments for instrumental variable analysis. Advances in Neural Information Processing Systems, 32, 2019. ISSN 10495258.
- Off-policy evaluation in infinite-horizon reinforcement learning with latent confounders. Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, pages 1999–2007, 3 2021. ISSN 2640-3498.
- Estimating the effects of continuous-valued interventions using generative adversarial networks. Advances in Neural Information Processing Systems, 2020-December, 2 2020. ISSN 10495258.
- Minimax rates for conditional density estimation via empirical entropy. Annals of Statistics, 51:762–790, 9 2021. doi: 10.1214/23-AOS2270. URL http://arxiv.org/abs/2109.10461http://dx.doi.org/10.1214/23-AOS2270.
- Rational chebyshev approximations for the inverse of the error function. Mathematics of Computation, 30(136):827, 10 1976. ISSN 00255718. doi: 10.2307/2005402.
- Semi-nonparametric iv estimation of shape-invariant engel curves. Econometrica, 75:1613–1669, 11 2007. ISSN 1468-0262. doi: 10.1111/J.1468-0262.2007.00808.X.
- Measuring the price responsiveness of gasoline demand: Economic shape restrictions and nonparametric demand estimation. Quantitative Economics, 3:29–51, 3 2012. ISSN 1759-7331. doi: 10.3982/QE91.
- Problems with instrumental variables estimation when the correlation between the instruments and the endogeneous explanatory variable is weak. Journal of the American Statistical Association, 90:443, 6 1995. ISSN 01621459. doi: 10.2307/2291055.
- X. Chen and T. M. Christensen. Optimal sup-norm rates and uniform inference on nonlinear functionals of nonparametric iv regression. Quantitative Economics, 9:39–84, 3 2018. ISSN 17597331. doi: 10.3982/qe722.
- On instrumental variable regression for deep offline policy evaluation. Journal of Machine Learning Research, 23, 5 2021. ISSN 15337928.
- Post-selection and post-regularization inference in linear models with many controls and instruments. American Economic Review, 105(5):486–490, 2015.
- Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1–C68, 2018. ISSN 1368-4221. doi: 10.1111/ECTJ.12097.
- Automatic debiased machine learning via neural nets for generalized linear regression. 4 2021. URL https://arxiv.org/abs/2104.14737v1.
- Locally robust semiparametric estimation. Econometrica, 90(4):1501–1535, 7 2022a. ISSN 0012-9682. doi: 10.3982/ecta16294.
- RieszNet and ForestRiesz: Automatic debiased machine learning with neural nets and random forests. Proceedings of Machine Learning Research, 162:3901–3914, 10 2022b. ISSN 26403498.
- Applied causal inference powered by ml and ai. rem, 12(1):338, 2024.
- Nonparametric instrumental regression. Econometrica, 79:1541–1565, 9 2011. ISSN 1468-0262. doi: 10.3982/ECTA6539.
- Offline reinforcement learning with instrumental variables in confounded markov decision processes. 2022.
- Doubly robust estimation of causal effects. American journal of epidemiology, 173(7):761–767, 2011.
- Generative adversarial networks. Communications of the ACM, 63:139–144, 6 2014. ISSN 15577317. doi: 10.1145/3422622.
- S. Grünewälder. Plug-in estimators for conditional expectations and probabilities. Proceedings of the 21 International Conference on Artificial Intelligence and Statistics, pages 1513–1521, 3 2018. ISSN 2640-3498.
- Deep IV: A flexible approach for counterfactual prediction. Proceedings of the 34th International Conference on Machine Learning, 2017. doi: 10.5555/3305381.3305527.
- J. L. Hill. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20:217–240, 3 2011. ISSN 10618600. doi: 10.1198/JCGS.2010.08162.
- H. Ichimura and W. K. Newey. The influence function of semiparametric estimators. Quantitative Economics, 13:29–61, 1 2022. ISSN 1759-7331. doi: 10.3982/QE826.
- Is pessimism provably efficient for offline rl? International Conference on Machine Learning, 2021.
- Estimating identifiable causal effects through double machine learning. AAAI Conference on Artificial Intelligence, 2021.
- Y. LeCun and C. Cortes. Mnist handwritten digit database, 2010. URL http://yann.lecun.com/exdb/mnist/.
- Causal reinforcement learning: An instrumental variable approach. SSRN Electronic Journal, 3 2021. doi: 10.2139/ssrn.3792824.
- Instrumental variable value iteration for causal offline reinforcement learning. 2021. doi: CoRRabs/2102.09907.
- I. Loshchilov and F. Hutter. Decoupled weight decay regularization. 7th International Conference on Learning Representations, ICLR 2019, 11 2017.
- Regret analysis of bandit problems with causal background knowledge. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence, 10 2020.
- Orthogonal machine learning: Power and limitations. 35th International Conference on Machine Learning, ICML 2018, 13:9112–9124, 11 2018.
- Dual instrumental variable regression. Advances in Neural Information Processing Systems, 2020-December, 10 2020. ISSN 10495258.
- Off-policy policy evaluation for sequential decisions under unobserved confounding. Advances in Neural Information Processing Systems, 33:18819–18831, 2020.
- M. Z. Nashed and G. Wahba. Generalized inverses in reproducing kernel spaces: An approach to regularization of linear operator equations. SIAM Journal on Mathematical Analysis, 5, 1974.
- Instrumental variable estimation of nonparametric models. Econometrica, 71:1565–1578, 9 2003. ISSN 1468-0262. doi: 10.1111/1468-0262.00459.
- J. Neyman and E. L. Scott. Asymptotically optimal tests of composite hypotheses for randomized experiments with noncontrolled predictor variables. Journal of the American Statistical Association, 60:699–721, 1965. ISSN 1537274X. doi: 10.1080/01621459.1965.10480822.
- Offline neural contextual bandits: Pessimism, optimization and generalization. Proceeding of the International Conference on Learning Representations, 2022.
- Delphic offline reinforcement learning under nonidentifiable hidden confounding. Workshop on New Frontiers in Learning, Control, and Dynamical Systems at ICML, 6 2023.
- Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32, 12 2019. ISSN 10495258.
- J. Pearl. Causality: models, reasoning, and inference. Econometric Theory, 2000.
- Drcfs: Doubly robust causal feature selection. In International Conference on Machine Learning, pages 28468–28491, 2023.
- O. Reiersöl. Confluence analysis by means of instrumental sets of variables. astronomi och fysik, 1945.
- Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89(427):846–866, 1994.
- P. M. Robinson. Root-n-consistent semiparametric regression. Econometrica, 56:931, 7 1988. ISSN 00129682. doi: 10.2307/1912705.
- Learning counterfactual representations for estimating individual dose-response curves. AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, pages 5612–5619, 2 2019. doi: 10.1609/aaai.v34i04.6014.
- Estimating individual treatment effect: generalization bounds and algorithms. 34th International Conference on Machine Learning, ICML 2017, 6:4709–4718, 6 2017.
- I. Shpitser and J. Pearl. Complete identification methods for the causal hierarchy. Journal of Machine Learning Research, 9(64):1941–1979, 2008. ISSN 1533-7928.
- Kernel instrumental variable regression. Advances in Neural Information Processing Systems, 32, 6 2019. ISSN 10495258.
- T. Słoczyński and J. M. Wooldridge. A general double robustness result for estimating average treatment effects. Econometric Theory, 34(1):112–133, 2018.
- Causal feature selection via orthogonal search. Transactions on Machine Learning Research, 2022.
- C. Subramanian and B. Ravindran. Causal contextual bandits with targeted interventions. In International Conference on Learning Representations, 1 2022.
- Finite-time analysis of kernelised contextual bandits. Uncertainty in Artificial Intelligence - Proceedings of the 29th Conference, UAI 2013, pages 654–663, 9 2013.
- R. Van Handel. Probability in high dimension. Lecture Notes (Princeton University), 2014.
- M. J. Wainwright. High-dimensional statistics: A non-asymptotic viewpoint. Cambridge University Press, pages 1–552, 1 2019. doi: 10.1017/9781108627771.
- E. W. Weisstein. Asymptotic notation, 2023. URL https://mathworld.wolfram.com/AsymptoticNotation.html.
- P. G. Wright. The tariff on animal and vegetable oils. https://doi.org/10.1086/254144, 38:619–620, 10 1928. ISSN 0022-3808. doi: 10.1086/254144.
- Learning instrumental variable from data fusion for treatment effect estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 8 2023.
- Annual pm2.5 and cardiovascular mortality rate data: Trends modified by county socioeconomic status in 2,132 us counties. Data in brief, 30, 6 2020. ISSN 2352-3409. doi: 10.1016/J.DIB.2020.105318.
- Learning deep features in instrumental variable regression. ICLR 2021 - 9th International Conference on Learning Representations, 10 2020.
- An instrumental variable approach to confounded off-policy evaluation. Proceedings of the 40th International Conference on Machine Learning, 2023.
- Y. Yang and A. Barron. Information-theoretic determination of minimax rates of convergence. Annals of Statistics, pages 1564–1599, 1999.
- J. Zhang and E. Bareinboim. Designing optimal dynamic treatment regimes: A causal reinforcement learning approach. Proceedings of the 37th International Conference on Machine Learning, page 119, 2020.
- Causal bandits: Online decision-making in endogenous settings. A causal view on dynamical systems workshop at NeurIPS 2022, 2022.