Disentangled Latent Representation Learning for Tackling the Confounding M-Bias Problem in Causal Inference (2312.05404v1)
Abstract: In causal inference, it is a fundamental task to estimate the causal effect from observational data. However, latent confounders pose major challenges in causal inference in observational data, for example, confounding bias and M-bias. Recent data-driven causal effect estimators tackle the confounding bias problem via balanced representation learning, but assume no M-bias in the system, thus they fail to handle the M-bias. In this paper, we identify a challenging and unsolved problem caused by a variable that leads to confounding bias and M-bias simultaneously. To address this problem with co-occurring M-bias and confounding bias, we propose a novel Disentangled Latent Representation learning framework for learning latent representations from proxy variables for unbiased Causal effect Estimation (DLRCE) from observational data. Specifically, DLRCE learns three sets of latent representations from the measured proxy variables to adjust for the confounding bias and M-bias. Extensive experiments on both synthetic and three real-world datasets demonstrate that DLRCE significantly outperforms the state-of-the-art estimators in the case of the presence of both confounding bias and M-bias.
- D. B. Rubin, “Causal inference using potential outcomes: Design, modeling, decisions,” J Am Stat Assoc, vol. 100, no. 469, pp. 322–331, 2005.
- S. Greenland, “Quantifying biases in causal models: classical confounding vs collider-stratification bias,” Epidemiology, vol. 14, no. 3, pp. 300–306, 2003.
- A. F. Connors, T. Speroff et al., “The effectiveness of right heart catheterization in the initial care of critically iii patients,” Journal of the American Medical Association, vol. 276, no. 11, pp. 889–897, 1996.
- R. Guo, L. Cheng et al., “A survey of learning causality with data: Problems and methods,” ACM Computing Surveys (CSUR), vol. 53, no. 4, pp. 1–37, 2020.
- A. Deaton and N. Cartwright, “Understanding and misunderstanding randomized controlled trials,” Social Science & Medicine, vol. 210, pp. 2–21, 2018.
- D. Cheng, J. Li et al., “Data-driven causal effect estimation based on graphical causal modelling: A survey,” vol. abs/2208.09590, 2022.
- U. Shalit, F. D. Johansson, and D. A. Sontag, “Estimating individual treatment effect: generalization bounds and algorithms,” in Proceedings of the 34th International Conference on Machine Learning, ICML, 2017, pp. 3076–3085.
- N. Hassanpour and R. Greiner, “Counterfactual regression with importance sampling weights,” in IJCAI, 2019, pp. 5880–5887.
- S. Athey, G. W. Imbens, and S. Wager, “Approximate residual balancing: debiased inference of average treatment effects in high dimensions,” J R Stat Soc Series B (Stat Methodol), vol. 80, no. 4, pp. 597–623, 2018.
- D. B. Rubin, “Estimating causal effects of treatments in randomized and nonrandomized studies.” J. Educ. Psychol, vol. 66, no. 5, p. 688, 1974.
- E. Perković, J. Textor et al., “Complete graphical characterization and construction of adjustment sets in Markov equivalence classes of ancestral graphs,” J. Mach. Learn. Res, vol. 18, no. 1, pp. 8132–8193, 2018.
- B. van der Zander, M. Liśkiewicz, and J. Textor, “Separators and adjustment sets in causal graphs: Complete criteria and an algorithmic framework,” Artificial Intelligence, vol. 270, pp. 1–40, 2019.
- J. Pearl, “Myth, confusion, and science in causal analysis,” Tech. Rep. R-348, 2009, los Angeles, CA: University of California.
- P. Ding and L. W. Miratrix, “To adjust or not to adjust? sensitivity analysis of m-bias and butterfly-bias,” Journal of Causal Inference, vol. 3, no. 1, pp. 41–57, 2015.
- D. Entner, P. Hoyer, and P. Spirtes, “Data-driven covariate selection for nonparametric estimation of causal effects,” in AISTATS, 2013, pp. 256–264.
- D. Cheng, J. Li et al., “Local search for efficient causal effect estimation,” IEEE Transactions on Knowledge & Data Engineering, no. 01, pp. 1–14, 2022.
- P. R. Rosenbaum and D. B. Rubin, “The central role of the propensity score in observational studies for causal effects,” Biometrika, vol. 70, no. 1, pp. 41–55, 1983.
- D. B. Rubin, “Using multivariate matched sampling and regression adjustment to control bias in observational studies,” Journal of the American Statistical Association, vol. 74, no. 366a, pp. 318–328, 1979.
- X. De Luna, I. Waernbaum, and T. S. Richardson, “Covariate selection for the nonparametric estimation of an average treatment effect,” Biometrika, vol. 98, no. 4, pp. 861–875, 2011.
- D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in 2nd International Conference on Learning Representations, ICLR, 2014.
- D. P. Kingma, M. Welling et al., “An introduction to variational autoencoders,” Foundations and Trends® in Machine Learning, vol. 12, no. 4, pp. 307–392, 2019.
- C. Louizos, U. Shalit, J. M. Mooij, D. Sontag, R. Zemel, and M. Welling, “Causal effect inference with deep latent-variable models,” in Advances in Neural Information Processing Systems, 2017, pp. 6446–6456.
- N. Hassanpour and R. Greiner, “Learning disentangled representations for counterfactual regression,” in International Conference on Learning Representations, 2019, pp. 1–11.
- W. Zhang, L. Liu, and J. Li, “Treatment effect estimation with disentangled latent factors,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI, 2021, pp. 10 923–10 930.
- D. Card, “Using geographic variation in college proximity to estimate the return to schooling,” National Bureau of Economic Research, Inc, NBER Working Papers 4483, 1993.
- M. D. Cattaneo, “Efficient semiparametric estimation of multi-valued treatment effects under ignorability,” Journal of Econometrics, vol. 155, no. 2, pp. 138–154, 2010.
- K. Sachs, O. Perez et al., “Causal protein-signaling networks derived from multiparameter single-cell data,” Science, vol. 308, no. 5721, pp. 523–529, 2005.
- V. Chernozhukov, D. Chetverikov et al., “Double/debiased machine learning for treatment and structural parameters,” The Econometrics Journal, vol. 21, no. 1, pp. C1–C68, 2018.
- V. Chernozhukov, M. Goldman et al., “Orthogonal machine learning for demand estimation: High dimensional causal inference in dynamic panels,” arXiv, pp. arXiv–1712, 2017.
- X. Nie and S. Wager, “Quasi-oracle estimation of heterogeneous treatment effects,” Biometrika, vol. 108, no. 2, pp. 299–319, 2021.
- S. R. Künzel, J. S. Sekhon et al., “Metalearners for estimating heterogeneous treatment effects using machine learning,” PNAS, vol. 116, no. 10, pp. 4156–4165, 2019.
- D. J. Foster and V. Syrgkanis, “Orthogonal statistical learning,” arXiv preprint arXiv:1901.09036, 2019.
- S. Athey, J. Tibshirani, and S. Wager, “Generalized random forests,” The Annals of Statistics, vol. 47, no. 2, pp. 1148–1178, 2019.
- J. L. Hill, “Bayesian nonparametric modeling for causal inference,” J Comput Graph Stat, vol. 20, no. 1, pp. 217–240, 2011.
- A. Paszke, S. Gross et al., “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32, NeurIPS, 2019, pp. 8024–8035.
- E. Bingham, J. P. Chen et al., “Pyro: Deep universal probabilistic programming,” The Journal of Machine Learning Research, vol. 20, no. 1, pp. 973–978, 2019.
- K. Battocchi, E. Dillon et al., “EconML: A Python Package for ML-Based Heterogeneous Treatment Effects Estimation,” https://github.com/microsoft/EconML, 2019.
- H. Chen, T. Harinen et al., “Causalml: Python package for causal machine learning,” arXiv preprint arXiv:2002.11631, 2020.
- D. Cheng, J. Li et al., “Toward unique and unbiased causal effect estimation from data with hidden variables,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, pp. 6108 – 6120, 2022.
- D. Almond, K. Y. Chay, and D. S. Lee, “The costs of low birth weight,” The Quarterly Journal of Economics, vol. 120, no. 3, pp. 1031–1083, 2005.
- D. Cheng, J. Li et al., “Discovering ancestral instrumental variables for causal inference from observational data,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–11, 2023.
- D. B. Rubin and N. Thomas, “Matching using estimated propensity scores: relating theory to practice,” Biometrics, pp. 249–264, 1996.
- A. Abadie and G. W. Imbens, “Large sample properties of matching estimators for average treatment effects,” econometrica, vol. 74, no. 1, pp. 235–267, 2006.
- H. A. Chipman, E. I. George, R. E. McCulloch et al., “Bart: Bayesian additive regression trees,” The Annals of Applied Statistics, vol. 4, no. 1, pp. 266–298, 2010.
- S. Athey and G. Imbens, “Recursive partitioning for heterogeneous causal effects,” PNAS, vol. 113, no. 27, pp. 7353–7360, 2016.
- S. Wager and S. Athey, “Estimation and inference of heterogeneous treatment effects using random forests,” J Am Stat Assoc, vol. 113, no. 523, pp. 1228–1242, 2018.
- J. Yoon, J. Jordon, and M. van der Schaar, “GANITE: estimation of individualized treatment effects using generative adversarial nets,” in 6th International Conference on Learning Representations, ICLR, 2018.
- N. Kallus, X. Mao, and M. Udell, “Causal inference with noisy and missing covariates via matrix factorization,” in Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, pp. 6921–6932.
- W. Miao, Z. Geng, and E. J. Tchetgen Tchetgen, “Identifying causal effects with proxy variables of an unmeasured confounder,” Biometrika, vol. 105, no. 4, pp. 987–993, 2018.
- Debo Cheng (42 papers)
- Yang Xie (19 papers)
- Ziqi Xu (30 papers)
- Jiuyong Li (63 papers)
- Lin Liu (190 papers)
- Jixue Liu (39 papers)
- Yinghao Zhang (13 papers)
- Zaiwen Feng (15 papers)