Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data (2407.13054v2)

Published 17 Jul 2024 in cs.AI

Abstract: Causal analysis has become an essential component in understanding the underlying causes of phenomena across various fields. Despite its significance, existing literature on causal discovery algorithms is fragmented, with inconsistent methodologies, i.e., there is no universal classification standard for existing methods, and a lack of comprehensive evaluations, i.e., data characteristics are often ignored to be jointly analyzed when benchmarking algorithms. This study addresses these gaps by conducting an exhaustive review and empirical evaluation for causal discovery methods on numerical data, aiming to provide a clearer and more structured understanding of the field. Our research begins with a comprehensive literature review spanning over two decades, analyzing over 200 academic articles and identifying more than 40 representative algorithms. This extensive analysis leads to the development of a structured taxonomy tailored to the complexities of causal discovery, categorizing methods into six main types. To address the lack of comprehensive evaluations, our study conducts an extensive empirical assessment of 29 causal discovery algorithms on multiple synthetic and real-world datasets. We categorize synthetic datasets based on size, linearity, and noise distribution, employing five evaluation metrics, and summarize the top-3 algorithm recommendations, providing guidelines for users in various data scenarios. Our results highlight a significant impact of dataset characteristics on algorithm performance. Moreover, a metadata extraction strategy with an accuracy exceeding 80% is developed to assist users in algorithm selection on unknown datasets. Based on these insights, we offer professional and practical guidelines to help users choose the most suitable causal discovery methods for their specific dataset.

Citations (3)

Summary

We haven't generated a summary for this paper yet.