Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evolutionary Causal Discovery with Relative Impact Stratification for Interpretable Data Analysis (2404.16361v1)

Published 25 Apr 2024 in cs.LG, cs.NE, and cs.SC

Abstract: This study proposes Evolutionary Causal Discovery (ECD) for causal discovery that tailors response variables, predictor variables, and corresponding operators to research datasets. Utilizing genetic programming for variable relationship parsing, the method proceeds with the Relative Impact Stratification (RIS) algorithm to assess the relative impact of predictor variables on the response variable, facilitating expression simplification and enhancing the interpretability of variable relationships. ECD proposes an expression tree to visualize the RIS results, offering a differentiated depiction of unknown causal relationships compared to conventional causal discovery. The ECD method represents an evolution and augmentation of existing causal discovery methods, providing an interpretable approach for analyzing variable relationships in complex systems, particularly in healthcare settings with Electronic Health Record (EHR) data. Experiments on both synthetic and real-world EHR datasets demonstrate the efficacy of ECD in uncovering patterns and mechanisms among variables, maintaining high accuracy and stability across different noise levels. On the real-world EHR dataset, ECD reveals the intricate relationships between the response variable and other predictive variables, aligning with the results of structural equation modeling and shapley additive explanations analyses.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. W. G. La Cava, P. C. Lee, I. Ajmal, X. Ding, P. Solanki, J. B. Cohen, J. H. Moore, and D. S. Herman, “A flexible symbolic regression method for constructing interpretable clinical prediction models,” npj Digital Medicine, vol. 6, no. 1, p. 107, 2023. [Online]. Available: https://doi.org/10.1038/s41746-023-00833-8
  2. J. Kotoku, A. Oyama, K. Kitazumi, H. Toki, A. Haga, R. Yamamoto, M. Shinzawa, M. Yamakawa, S. Fukui, K. Yamamoto, and T. Moriyama, “Causal relations of health indices inferred statistically using the directlingam algorithm from big data of osaka prefecture health checkups,” PLOS ONE, vol. 15, no. 12, pp. 1–19, 12 2020. [Online]. Available: https://doi.org/10.1371/journal.pone.0243229
  3. G. Erion, J. D. Janizek, C. Hudelson, R. B. Utarnachitt, A. M. McCoy, M. R. Sayre, N. J. White, and S.-I. Lee, “A cost-aware framework for the development of ai models for healthcare applications,” Nature Biomedical Engineering, vol. 6, no. 12, pp. 1384–1398, 2022. [Online]. Available: https://doi.org/10.1038/s41551-022-00872-8
  4. O. Morin, M. Vallières, S. Braunstein, J. B. Ginart, T. Upadhaya, H. C. Woodruff, A. Zwanenburg, A. Chatterjee, J. E. Villanueva-Meyer, G. Valdes, W. Chen, J. C. Hong, S. S. Yom, T. D. Solberg, S. Löck, J. Seuntjens, C. Park, and P. Lambin, “An artificial intelligence framework integrating longitudinal electronic health records with real-world data enables continuous pan-cancer prognostication,” Nature Cancer, vol. 2, no. 7, pp. 709–722, 2021. [Online]. Available: https://doi.org/10.1038/s43018-021-00236-2
  5. C. Glymour, K. Zhang, and P. Spirtes, “Review of causal discovery methods based on graphical models,” Frontiers in Genetics, vol. 10, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:173992893
  6. S. Shimizu, T. Inazumi, Y. Sogawa, A. Hyvärinen, Y. Kawahara, T. Washio, P. O. Hoyer, and K. A. Bollen, “Directlingam: A direct method for learning a linear non-gaussian structural equation model,” J. Mach. Learn. Res., vol. 12, pp. 1225–1248, 2011. [Online]. Available: https://api.semanticscholar.org/CorpusID:6068978
  7. X. Wu, Q. Lin, J. Zhou, S. Liu, C. A. Coello Coello, and V. C. M. Leung, “Evolutionary optimization with simplified helper task for high-dimensional expensive multiobjective problems,” ACM Trans. Evol. Learn. Optim., January 2024, just Accepted. [Online]. Available: https://doi.org/10.1145/3637065
  8. S. Wagner and M. Affenzeller, “Heuristiclab: A generic and extensible optimization environment,” in Computer Science, Engineering, 2005. [Online]. Available: https://api.semanticscholar.org/CorpusID:16397420
  9. F.-A. Fortin, F.-M. D. Rainville, M.-A. Gardner, M. Parizeau, and C. Gagné, “Deap: Evolutionary algorithms made easy,” Journal of Machine Learning Research, vol. 13, no. 70, pp. 2171–2175, July 2012.
  10. C. A. C. Coello, D. A. van Veldhuizen, and G. B. Lamont, “Evolutionary algorithms for solving multi-objective problems,” in Genetic Algorithms and Evolutionary Computation, 2002. [Online]. Available: https://api.semanticscholar.org/CorpusID:36482639
  11. A. Menchaca-Méndez and C. A. C. Coello, “An alternative hypervolume-based selection mechanism for multi-objective evolutionary algorithms,” Soft Computing, vol. 21, pp. 861 – 884, 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:7598446
  12. W. Wei, M. Xuan, L. Li, Q. Lin, Z. Ming, and C. A. Coello Coello, “Multiobjective optimization algorithm with dynamic operator selection for feature selection in high-dimensional classification,” Applied Soft Computing, vol. 143, p. 110360, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1568494623003782
  13. X. Zheng, B. Aragam, P. Ravikumar, and E. P. Xing, “Dags with notears: Continuous optimization for structure learning,” in Neural Information Processing Systems, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:53217974
  14. J. Otsuka, “Causal foundations of evolutionary genetics,” The British Journal for the Philosophy of Science, vol. 67, pp. 247 – 269, 2016. [Online]. Available: https://api.semanticscholar.org/CorpusID:31615559
  15. P. Orzechowski, W. La Cava, and J. H. Moore, “Where are we now? a large benchmark study of recent symbolic regression methods,” in Proceedings of the Genetic and Evolutionary Computation Conference, ser. GECCO ’18.   New York, NY, USA: Association for Computing Machinery, 2018, pp. 1183–1190. [Online]. Available: https://doi.org/10.1145/3205455.3205539
  16. C. Gunaratne and I. Garibay, “Evolutionary model discovery of causal factors behind the socio-agricultural behavior of the ancestral pueblo.” PLoS One, vol. 15, no. 12, p. e0239922, 2020.
  17. J. Zhao, Q. Feng, P. Wu, R. A. Lupu, R. A. Wilke, Q. S. Wells, J. C. Denny, and W.-Q. Wei, “Learning from longitudinal data in electronic health record and genetic data to improve cardiovascular event prediction,” Scientific Reports, vol. 9, no. 1, p. 717, 2019. [Online]. Available: https://doi.org/10.1038/s41598-018-36745-x
  18. Y. Bi, B. Xue, and M. Zhang, “Genetic programming-based evolutionary deep learning for data-efficient image classification,” IEEE Transactions on Evolutionary Computation, vol. 28, no. 2, pp. 307–322, 2024.
  19. F. Zhang, Y. Mei, S. Nguyen, and M. Zhang, “Survey on genetic programming and machine learning techniques for heuristic design in job shop scheduling,” IEEE Transactions on Evolutionary Computation, vol. 28, no. 1, pp. 147–167, 2024.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets