Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Model-based causal feature selection for general response types (2309.12833v4)

Published 22 Sep 2023 in stat.ME, math.ST, stat.ML, and stat.TH

Abstract: Discovering causal relationships from observational data is a fundamental yet challenging task. Invariant causal prediction (ICP, Peters et al., 2016) is a method for causal feature selection which requires data from heterogeneous settings and exploits that causal models are invariant. ICP has been extended to general additive noise models and to nonparametric settings using conditional independence tests. However, the latter often suffer from low power (or poor type I error control) and additive noise models are not suitable for applications in which the response is not measured on a continuous scale, but reflects categories or counts. Here, we develop transformation-model (TRAM) based ICP, allowing for continuous, categorical, count-type, and uninformatively censored responses (these model classes, generally, do not allow for identifiability when there is no exogenous heterogeneity). As an invariance test, we propose TRAM-GCM based on the expected conditional covariance between environments and score residuals with uniform asymptotic level guarantees. For the special case of linear shift TRAMs, we also consider TRAM-Wald, which tests invariance based on the Wald statistic. We provide an open-source R package 'tramicp' and evaluate our approach on simulated data and in a case study investigating causal features of survival in critically ill patients.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (91)
  1. J. Aldrich. Autonomy. Oxford Economic Papers, 41(1):15–34, 1989. doi:10.1093/oxfordjournals.oep.a041889.
  2. On the Markov Equivalence of Chain Graphs, Undirected Graphs, and Acyclic Digraphs. Scandinavian Journal of Statistics, 24(1):81–102, 1997. doi:10.1111/1467-9469.00050.
  3. L. Barbanti and T. Hothorn. A Transformation Perspective on Marginal and Conditional Models. arXiv preprint, 2019. doi:10.48550/arxiv.1910.09219.
  4. Residuals for Relative Risk Regression. Biometrika, 75(1):65–74, 1988. doi:10.1093/biomet/75.1.65.
  5. V. Bengs and H. Holzmann. Uniform Approximation in Classical Weak Convergence Theory. arXiv preprint, 2019. doi:10.48550/arXiv.1903.09864.
  6. The Conditional Permutation Test for Independence While Controlling for Confounders. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(1):175–197, 2019. doi:10.1111/rssb.12340.
  7. An Analysis of Transformations Revisited. Journal of the American Statistical Association, 76(374):296–311, 1981. doi:10.1080/01621459.1981.10477649.
  8. Foundations of Structural Causal Models with Cycles and Latent Variables. The Annals of Statistics, 49(5):2885–2915, 2021. doi:10.1214/21-AOS2064.
  9. An Analysis of Transformations. Journal of the Royal Statistical Society: Series B (Methodological), 26(2):211–243, 1964. doi:10.1111/j.2517-6161.1964.tb00553.x.
  10. Panning for Gold: ‘Model-X’ Knockoffs for High Dimensional Controlled Variable Selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(3):551–577, 2018. doi:10.1111/rssb.12265.
  11. R. Castelo and T. Kocka. On Inclusion-Driven Learning of Bayesian Networks. Journal of Machine Learning Research, 4(Sep):527–574, 2003. URL https://www.jmlr.org/papers/v4/castelo03a.html.
  12. Analysis of Transformation Models with Censored Data. Biometrika, 82(4):835–845, 1995. doi:10.1093/biomet/82.4.835.
  13. Double/Debiased/Neyman Machine Learning of Treatment Effects. American Economic Review, 107(5):261–65, 2017. doi:10.1257/aer.p20171038.
  14. D. M. Chickering. Optimal Structure Identification with Greedy Search. Journal of Machine Learning Research, 3(Nov):507–554, 2002. URL https://www.jmlr.org/papers/v3/chickering02b.html.
  15. R. Christiansen and J. Peters. Switching Regression Models and Causal Inference in the Presence of Discrete Latent Variables. Journal of Machine Learning Research, 21(41):1–46, 2020. URL https://www.jmlr.org/papers/v21/19-407.html.
  16. D. R. Cox. Regression Models and Life-Tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2):187–202, 1972. doi:10.1111/j.2517-6161.1972.tb00899.x.
  17. Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics, 44(3):837–845, 1988. doi:10.2307/2531595.
  18. Identifying causes of Pyrocumulonimbus (PyroCb). In NeurIPS 2022 Workshop on Causality for Real-world Impact, 2022. URL https://openreview.net/forum?id=rM6HO4h1MI.
  19. V. Didelez and M. J. Stensrud. On the Logic of Collapsibility for Causal Effect Measures. Biometrical Journal, 64(2):235–242, 2022. doi:10.1002/bimj.202000305.
  20. K. Doksum. Empirical Probability Plots and Statistical Inference for Nonlinear Models in the Two-Sample Case. The Annals of Statistics, 2(2):267–277, 1974. doi:10.1214/aos/1176342662.
  21. C. P. Farrington. Residuals for Proportional Hazards Models with Interval-Censored Survival Data. Biometrics, 56(2):473–482, 2000. doi:10.1111/j.0006-341x.2000.00473.x.
  22. Model Inconsistency, Illustrated by the Cox Proportional Hazards Model. Statistics in Medicine, 14(8):735–746, 1995. doi:https://doi.org/10.1002/sim.4780140804.
  23. Autonomy of Economic Relations. Technical report, Universitets Socialøkonomiske Institutt, Oslo, Norway, 1948.
  24. Kernel Measures of Conditional Dependence. In Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007. URL https://proceedings.neurips.cc/paper_files/paper/2007/file/3a0772443a0739141292a5429b952fe6-Paper.pdf.
  25. M. H. Gail. Adjusting for Covariates That Have the Same Distribution in Exposed and Unexposed Cohorts. In In Modern Statistical Methods in Chronic Disease Epidemiology, pages 3–18. John Wiley & Sons, 1986.
  26. Biased Estimates of Treatment Effect in Randomized Experiments with Nonlinear Regressions and Omitted Covariates. Biometrika, 71(3):431–444, 1984. doi:10.1093/biomet/71.3.431.
  27. Review of Causal Discovery Methods Based on Graphical Models. Frontiers in Genetics, 10, 2019. doi:10.3389/fgene.2019.00524.
  28. S. Greenland. Absence of Confounding Does Not Correspond to Collapsibility of the Rate Ratio or Rate Difference. Epidemiology, 7(5):498–501, 1996. doi:10.1097/00001648-199609000-00008.
  29. Confounding and Collapsibility in Causal Inference. Statistical Science, 14(1):29–46, 1999. doi:10.1214/ss/1009211805.
  30. A Kernel Statistical Test of Independence. In Proceedings of Advances in Neural Information Processing Systems, volume 20 of NeurIPS, 2007. URL https://proceedings.neurips.cc/paper/2007/file/d5cfead94f5350c12c322b5b664544c1-Paper.pdf.
  31. J. Guo and Z. Geng. Collapsibility of Logistic Regression Coefficients. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):263–267, 1995. doi:10.1111/j.2517-6161.1995.tb02029.x.
  32. Causal Feature Selection. In Computational Methods of Feature Selection, pages 79–102. Chapman and Hall/CRC, 1st edition, 2007. doi:10.1201/9781584888796.
  33. T. Haavelmo. The Statistical Implications of a System of Simultaneous Equations. Econometrica, 11(1):1–12, 1943. doi:10.2307/1905714.
  34. A. Hauser and P. Bühlmann. Jointly Interventional and Observational Data: Estimation of Interventional Markov Equivalence Classes of Directed Acyclic Graphs. Journal of the Royal Statistical Society: Series B (Methodological), 77(1):291–318, 2015. doi:10.1111/rssb.12071.
  35. Y.-B. He and Z. Geng. Active Learning of Causal Networks with Intervention Experiments and Optimal Designs. Journal of Machine Learning Research, 9(84):2523–2547, 2008. URL https://www.jmlr.org/papers/v9/he08a.html.
  36. Invariant Causal Prediction for Nonlinear Models. Journal of Causal Inference, 6(2):20170016, 2018. doi:10.1515/jci-2017-0016.
  37. Causal Inference. CRC, 2010.
  38. The Simpson’s Paradox Unraveled. International Journal of Epidemiology, 40(3):780–785, 03 2011. doi:10.1093/ije/dyr041.
  39. Conditional Transformation Models. Journal of the Royal Statistical Society. Series B (Methodological), 76(1):3–27, 2014. doi:10.1111/rssb.12017.
  40. Most Likely Transformations. Scandinavian Journal of Statistics, 45(1):110–134, 2018. doi:10.1111/sjos.12291.
  41. tram: Transformation Models, 2022. URL https://CRAN.R-project.org/package=tram. R package version 0.7-2.
  42. Nonlinear Causal Discovery With Additive Noise Models. In Advances in Neural Information Processing Systems, volume 21 of NeurIPS, pages 689–696. Curran Associates, Inc., 2008. URL https://proceedings.neurips.cc/paper_files/paper/2008/file/f7664060cc52bc6f3d620bcedc94a4b6-Paper.pdf.
  43. Random Survival Forests. The Annals of Applied Statistics, 2(3):841–860, 2008. doi:10.1214/08-AOAS169.
  44. M. Jay. generalhoslem: Goodness of Fit Tests for Logistic Regression Models, 2019. URL https://CRAN.R-project.org/package=generalhoslem. R package version 1.3.4.
  45. The statistical analysis of failure time data. John Wiley & Sons, 2011.
  46. pcalg: Methods for Graphical Models and Causal Inference, 2022. URL https://CRAN.R-project.org/package=pcalg. R package version 2.7-6.
  47. D. G. Kleinbaum and M. Klein. Parametric Survival Models. In Survival Analysis, pages 289–361. Springer-Verlag, 2012.
  48. The support prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of internal medicine, 122(3):191–203, 1995. doi:10.7326/0003-4819-122-3-199502010-00007. The SUPPORT2 dataset is available at https://hbiostat.org/data/.
  49. L. Kook and T. Hothorn. Regularized Transformation Models: The tramnet Package. The R Journal, 13(1):581–594, 2021. doi:10.32614/rj-2021-054.
  50. Distributional Anchor Regression. Statistics and Computing, 32(3):39, 2022. doi:10.1007/s11222-022-10097-z.
  51. Survival forests under test: Impact of the proportional hazards assumption on prognostic and predictive forests for amyotrophic lateral sclerosis survival. Statistical Methods in Medical Research, 29(5):1403–1419, 2020. doi:10.1177/0962280219862586.
  52. S. W. Lagakos. The Graphical Evaluation of Explanatory Variables in Proportional Hazard Regression Models. Biometrika, 68(1):93–98, 1981. doi:10.1093/biomet/68.1.93.
  53. A. D. Laksafoss. Invariant Causal Prediction for Event and Time to Event Data. Master’s thesis, University of Copenhagen, Department of Mathematical Sciences, 2020.
  54. Conditional Independence Testing in Hilbert Spaces with Applications to Functional Data Analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 84(5):1821–1850, 2022. doi:10.1111/rssb.12544.
  55. T. Martinussen and S. Vansteelandt. On Collapsibility and Confounding Bias in Cox and Aalen Regression Models. Lifetime Data Analysis, 19(3):279–296, 2013. doi:10.1007/s10985-013-9242-z.
  56. P. McCullagh. Regression Models for Ordinal Data. Journal of the Royal Statistical Society: Series B (Methodological), 42(2):109–127, 1980. doi:10.1111/j.2517-6161.1980.tb01109.x.
  57. P. McCullagh and J. A. Nelder. Generalized Linear Models. Routledge, 2019. doi:10.1201/9780203753736.
  58. Efficient Sieve Maximum Likelihood Estimation of Time-transformation Models. Journal of Statistical Theory and Practice, 7:285–303, 2013. doi:10.1080/15598608.2013.772835.
  59. N. Meinshausen. Quantile Regression Forests. Journal of Machine Learning Research, 7(35):983–999, 2006. URL http://jmlr.org/papers/v7/meinshausen06a.html.
  60. N. Meinshausen. InvariantCausalPrediction: Invariant Causal Prediction, 2019. URL https://CRAN.R-project.org/package=InvariantCausalPrediction. R package version 0.8.
  61. N. Meinshausen and P. Bühlmann. High-Dimensional Graphs and Variable Selection with the Lasso. The Annals of Statistics, 34(3):1436–1462, 2006. doi:10.1214/009053606000000281.
  62. Methods for Causal Inference from Gene Perturbation Experiments and Validation. Proceedings of the National Academy of Sciences of the United States of America, 113(27):7361–7368, 2016. doi:10.1073/pnas.1510493113.
  63. Invariant Ancestry Search. arXiv preprint, 2022. doi:10.48550/arxiv.2202.00913.
  64. Estimating the Effect of Joint Interventions from Observational Data in Sparse High-Dimensional Settings. The Annals of Statistics, 45(2):647 – 674, 2017. doi:10.1214/16-aos1462.
  65. J. Pearl. Causality. Cambridge university press, 2009.
  66. J. Peters and P. Bühlmann. Identifiability of Gaussian Structural Equation Models with Equal Error Variances. Biometrika, 101(1):219–228, 2013. doi:10.1093/biomet/ast043.
  67. J. Peters and R. D. Shah. GeneralisedCovarianceMeasure: Test for Conditional Independence Based on the Generalized Covariance Measure (GCM), 2022. URL https://CRAN.R-project.org/package=GeneralisedCovarianceMeasure. R package version 0.2.0.
  68. Causal Discovery with Continuous Additive Noise Models. Journal of Machine Learning Research, 15(58):2009–2053, 2014. URL http://jmlr.org/papers/v15/peters14a.html.
  69. Causal Inference by Using Invariant Prediction: Identification and Confidence Intervals. Journal of the Royal Statistical Society: Series B (Methodological), 78(5):947–1012, 2016. doi:10.1111/rssb.12167.
  70. Kernel-Based Tests for Joint Independence. Journal of the Royal Statistical Society: Series B (Methodological), 80(1):5–31, 2018. doi:10.1111/rssb.12235.
  71. Invariant Causal Prediction for Sequential Data. Journal of the American Statistical Association, 114(527):1264–1276, 2019. doi:10.1080/01621459.2018.1491403.
  72. E. Pulkstenis and T. J. Robinson. Goodness-of-Fit Tests for Ordinal Response Regression Models. Statistics in Medicine, 23(6):999–1014, 2004. doi:https://doi.org/10.1002/sim.1659.
  73. pROC: Display and Analyze ROC Curves, 2021. URL https://CRAN.R-project.org/package=pROC. R package version 1.18.0.
  74. Some Surprising Results about Covariate Adjustment in Logistic Regression Models. International Statistical Review / Revue Internationale de Statistique, 59(2):227–240, 1991. doi:10.2307/1403444.
  75. Anchor Regression: Heterogeneous Data Meet Causality. Journal of the Royal Statistical Society: Series B (Methodological), 83(2):215–246, 2021. doi:10.1111/rssb.12398.
  76. On Causal and Anticausal Learning. In Proceedings of the 29th International Coference on International Conference on Machine Learning, ICML, pages 459–466. Omnipress, 2012. doi:https://doi.org/10.48550/arXiv.1206.6471.
  77. R. D. Shah and J. Peters. The Hardness of Conditional Independence Testing and the Generalised Covariance Measure. The Annals of Statistics, 48(3):1514–1538, 2020. doi:10.1214/19-aos1857.
  78. S. Shimizu. LiNGAM: Non-Gaussian Methods for Estimating Causal Structures. Behaviormetrika, 41(1):65–98, 2014. doi:10.2333/bhmk.41.65.
  79. S. Siegfried and T. Hothorn. Count Transformation Models. Methods in Ecology and Evolution, 11(7):818–827, 2020. doi:10.1111/2041-210x.13383.
  80. cotram: Count Transformation Models, 2021. URL https://CRAN.R-project.org/package=cotram. R package version 0.3-1.
  81. Causation, Prediction, and Search. MIT press, 2000. doi:10.7551/mitpress/1754.001.0001.
  82. Approximate Kernel-Based Conditional Independence Tests for Fast Non-Parametric Causal Discovery. Journal of Causal Inference, 7(1):20180017, 2019. doi:10.1515/jci-2018-0017.
  83. B. Tamási and T. Hothorn. tramME: Mixed-Effects Transformation Models Using Template Model Builder. The R Journal, 13(2):398–418, 2021. doi:10.32614/RJ-2021-075.
  84. J. Tian and J. Pearl. Causal Discovery from Changes. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, 2001. doi:https://doi.org/10.48550/arXiv.1301.2312.
  85. G. Tutz. Regression for Categorical Data, volume 34. Cambridge University Press, 2011. doi:10.1017/cbo9780511842061.
  86. A. W. Van der Vaart. Asymptotic Statistics, volume 3. Cambridge university press, 2000. doi:10.1017/CBO9780511802256.
  87. T. Verma and J. Pearl. Causal Networks: Semantics and Expressiveness. In Machine Intelligence and Pattern Recognition, volume 9, pages 69–76. Elsevier, 1990. doi:https://doi.org/10.48550/arXiv.1304.2379.
  88. A. S. Whittemore. Collapsibility of Multidimensional Contingency Tables. Journal of the Royal Statistical Society: Series B (Methodological), 40(3):328–340, 1978. doi:10.1111/j.2517-6161.1978.tb01046.x.
  89. A. Wienke. Frailty models in survival analysis. CRC press, 2010.
  90. ranger: A Fast Implementation of Random Forests, 2022. URL https://CRAN.R-project.org/package=ranger. R package version 0.14.1.
  91. Kernel-Based Conditional Independence Test and Application in Causal Discovery. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, page 804–813. AUAI Press, 2011. doi:https://doi.org/10.48550/arXiv.1202.3775.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com