Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automating the Selection of Proxy Variables of Unmeasured Confounders (2405.16130v1)

Published 25 May 2024 in cs.LG and stat.ME

Abstract: Recently, interest has grown in the use of proxy variables of unobserved confounding for inferring the causal effect in the presence of unmeasured confounders from observational data. One difficulty inhibiting the practical use is finding valid proxy variables of unobserved confounding to a target causal effect of interest. These proxy variables are typically justified by background knowledge. In this paper, we investigate the estimation of causal effects among multiple treatments and a single outcome, all of which are affected by unmeasured confounders, within a linear causal model, without prior knowledge of the validity of proxy variables. To be more specific, we first extend the existing proxy variable estimator, originally addressing a single unmeasured confounder, to accommodate scenarios where multiple unmeasured confounders exist between the treatments and the outcome. Subsequently, we present two different sets of precise identifiability conditions for selecting valid proxy variables of unmeasured confounders, based on the second-order statistics and higher-order statistics of the data, respectively. Moreover, we propose two data-driven methods for the selection of proxy variables and for the unbiased estimation of causal effects. Theoretical analysis demonstrates the correctness of our proposed algorithms. Experimental results on both synthetic and real-world data show the effectiveness of the proposed approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. Anderson, T. W. An Introduction to Multivariate Statistical Analysis. 2nd ed. John Wiley & Sons, 1984.
  2. Determining the number of factors in approximate factor models. Econometrica, 70(1):191–221, 2002.
  3. Instrumental variable methods for causal inference. Statistics in Medicine, 33:2297–2340, 2014.
  4. Bollen, K. A. Structural equations with latent variables wiley. New York, 1989.
  5. Instrumental variables. Number 8. Cambridge university press, 1990.
  6. Triad constraints for learning causal structure of latent variables. In Advances in Neural Information Processing Systems, pp. 12863–12872, 2019.
  7. Toward unique and unbiased causal effect estimation from data with hidden variables. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  8. Discovering ancestral instrumental variables for causal inference from observational data. IEEE Transactions on Neural Networks and Learning Systems, 2023.
  9. Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, pp.  294–321, 2012.
  10. Cramér, H. Random variables and probability distributions. Cambridge University Press, Cambridge, 2nd edition, 1962.
  11. D’Amour, A. Comment: Reflections on the deconfounder. Journal of the American Statistical Association, 114:1597–1601, 2019b.
  12. Darmois, G. Analyse générale des liaisons stochastiques: etude particulière de l’analyse factorielle linéaire. Revue de l’Institut international de statistique, pp.  2–8, 1953.
  13. Dawid, A. P. Conditional independence in statistical theory. Journal of the Royal Statistical Society: Series B (Methodological), 41(1):1–15, 1979.
  14. Proxy variables and nonparametric identification of causal effects. Economics Letters, 150:152–154, 2017.
  15. Positivity for gaussian graphical models. Advances in Applied Mathematics, 50(5):661–674, 2013.
  16. Nested covariance determinants and restricted trek separation in gaussian graphical models. Bernoulli, 26(4), 2020.
  17. Causal classification: Treatment effect estimation vs. outcome prediction. The Journal of Machine Learning Research, 23(1):2573–2607, 2022.
  18. Goldberger, A. S. Structural equation methods in the social sciences. Econometrica: Journal of the Econometric Society, pp. 979–1001, 1972.
  19. Gunsilius, F. F. Nontestability of instrument validity under continuous treatments. Biometrika, 108(4):989–995, 2021.
  20. Estimating causal effects from epidemiological data. Journal of Epidemiology & Community Health, 60(7):578–586, 2006a.
  21. Instruments for causal inference: an epidemiologist’s dream? Epidemiology, 17(4):360–372, 2006b.
  22. Nonlinear causal discovery with additive noise models. In Advances in neural information processing systems, pp. 689–696, 2009.
  23. Latent hierarchical causal structure discovery with rank constraints. Advances in Neural Information Processing Systems, 35:5549–5561, 2022.
  24. Independent component analysis, volume 46. John Wiley & Sons, 2004.
  25. Estimation of a structural vector autoregression model using non-gaussianity. Journal of Machine Learning Research, 11(5), 2010.
  26. Imbens, G. W. Instrumental variables: An econometrician’s perspective. Statistical Science, 29(3):323–358, 2014.
  27. Causal inference for statistics, social, and biomedical sciences: An introduction. Cambridge University Press, 2015.
  28. Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization. Journal of the American statistical Association, 111:132–144, 2016.
  29. Causal clustering for 1-factor measurement models. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp.  1655–1664, 2016.
  30. Data-driven automated negative control estimation (dance): Search for, validation of, and causal inference with negative controls. arXiv preprint arXiv:2210.00528, 2022.
  31. Instrumental variable tests for directed acyclic graph models. In International Workshop on Artificial Intelligence and Statistics, pp.  190–197. PMLR, 2005.
  32. Measurement bias and effect restoration in causal inference. Biometrika, 101:423–437, 2014.
  33. Langley, P. Crafting papers on machine learning. In Langley, P. (ed.), Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp.  1207–1216, Stanford, CA, 2000. Morgan Kaufmann.
  34. Regularization methods for high-dimensional instrumental variables regression with an application to genetical genomics. Journal of the American Statistical Association, 110(509):270–288, 2015.
  35. Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology (Cambridge, Mass.), 21(3):383, 2010.
  36. Proximal causal learning with kernels: Two-stage estimation and moment restriction. In International Conference on Machine Learning, pp. 7512–7523. PMLR, 2021.
  37. Identifiability of normal and normal mixture models with nonignorable missing data. Journal of the American Statistical Association, 111:1673–1683, 2016.
  38. Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika, 105:987–993., 2018a.
  39. A confounding bridge approach for double negative control inference on causal effects. arXiv preprint arXiv:1808.04945, 2018b.
  40. Identifying effects of multiple treatments in the presence of unmeasured confounding. Journal of the American Statistical Association, pp.  1–15, 2022.
  41. Comment on “Blessings of multiple causes”. Journal of the American Statistical Association, 114:1611–1615, 2019.
  42. Pearl, J. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan kaufmann, 1988.
  43. Pearl, J. On the testability of causal models with latent and instrumental variables. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pp.  435–443, 1995.
  44. Pearl, J. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2nd edition, 2009.
  45. Minimal nonlinear distortion principle for nonlinear independent component analysis. Journal of Machine Learning Research, 15:2009–2053, 2014.
  46. Elements of Causal Inference. MIT Press, 2017.
  47. Efficient adjustment sets for population average causal treatment effect estimation in graphical models. The Journal of Machine Learning Research, 21(1):7642–7727, 2020.
  48. Learning linear non-gaussian causal models in the presence of latent variables. The Journal of Machine Learning Research, 21(1):1436–1459, 2020.
  49. Schneeberger, M. Irx3, a new leader on obesity genetics. EBioMedicine, 39:19–20, 2019.
  50. Multiply robust causal inference with double-negative control adjustment for categorical unmeasured confounding. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(2):521–540, 2020a.
  51. A selective review of negative control methods in epidemiology. Current epidemiology reports, 7:190–202, 2020b.
  52. Shimizu, S. Statistical Causal Discovery: LiNGAM Approach. Springer, 2022.
  53. A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7(Oct):2003–2030, 2006.
  54. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research, 12(Apr):1225–1248, 2011.
  55. The proximal id algorithm. Journal of Machine Learning Research, 23:1–46, 2023.
  56. Learning instrumental variables with structural and non-gaussianity assumptions. Journal of Machine Learning Research, 18(120):1–49, 2017.
  57. Singh, R. Kernel methods for unobserved confounding: Negative controls, proxies, and instruments. arXiv preprint arXiv:2012.10315, 2020.
  58. Skitovitch, V. P. On a property of the normal distribution. DAN SSSR, 89:217–219, 1953.
  59. On negative outcome control of unobserved confounding as a generalization of difference-in-differences. Statistical science: a review journal of the Institute of Mathematical Statistics, 31(3):348, 2016.
  60. Spearman, C. Pearson’s contribution to the theory of two factors. British Journal of Psychology. General Section, 19(1):95–101, 1928.
  61. Spirtes, P. Introduction to causal inference. Journal of Machine Learning Research, 11(5), 2010.
  62. Spirtes, P. Calculation of entailed rank constraints in partially non-linear and cyclic models. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pp.  606–615. AUAI Press, 2013.
  63. Causal discovery and inference: concepts and recent methodological advances. Applied Informatics, 3(1):1–28, 2016.
  64. Causal inference in the presence of latent variables and selection bias. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence (UAI), pp.  499–506. Morgan Kaufmann Publishers Inc., 1995.
  65. Causation, Prediction, and Search. MIT press, 2000.
  66. Trek separation for gaussian graphical models. The Annals of Statistics, 38(3):1665–1685, 2010.
  67. An introduction to proximal causal learning. arXiv preprint arXiv:2009.10982, 2020.
  68. Van der Vaart, A. W. Asymptotic statistics, volume 3. Cambridge university press, 2000.
  69. Separators and adjustment sets in causal graphs: Complete criteria and an algorithmic framework. Artificial Intelligence, 270:1–40, 2019.
  70. Confounder adjustment in multiple hypothesis testing. Annals of statistics, 45(5):1863, 2017.
  71. Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genetics, 2:e15, 2006.
  72. A proxy variable view of shared confounding. In International Conference on Machine Learning, pp. 10697–10707. PMLR, 2021.
  73. The Blessings of Multiple Causes. Journal of the American Statistical Association, 114:1574–1596, 2019.
  74. High-dimensional causal discovery under non-gaussianity. Biometrika, 107(1):41–59, 2020.
  75. Igf-binding protein-2 protects against the development of obesity and insulin resistance. Diabetes, 56:285–294, 2007.
  76. Wright, P. G. Tariff on Animal and Vegetable Oils. Macmillan, New York, 1928.
  77. Generalized independent noise conditionfor estimating latent variable causal graphs. In Advances in Neural Information Processing Systems, pp. 14891–14902, 2020.
  78. Testability of instrumental variables in linear non-gaussian acyclic causal models. Entropy, 24(4):512, 2022a.
  79. Identification of linear non-gaussian latent hierarchical structure. In International Conference on Machine Learning, pp. 24370–24387. PMLR, 2022b.
  80. Generalized independent noise condition for estimating causal structure with latent variables. arXiv preprint arXiv:2308.06718, 2023.
  81. Deep proxy causal learning and its application to confounded bandit policy evaluation. Advances in Neural Information Processing Systems, 34:26264–26275, 2021.
  82. Zhang, J. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172(16-17):1873–1896, 2008.
  83. On the identifiability of the post-nonlinear causal model. In UAI, pp.  647–655. AUAI Press, 2009.
  84. Large-scale kernel methods for independence testing. Statistics and Computing, 28(1):113–130, 2018.
  85. Learning linear non-gaussian directed acyclic graph with diverging number of nodes. The Journal of Machine Learning Research, 23(1):12314–12347, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Feng Xie (68 papers)
  2. Zhengming Chen (8 papers)
  3. Shanshan Luo (13 papers)
  4. Wang Miao (43 papers)
  5. Ruichu Cai (68 papers)
  6. Zhi Geng (32 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets