Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Empirical Study of Fault Localization Families and Their Combinations (1803.09939v2)

Published 27 Mar 2018 in cs.SE

Abstract: The performance of fault localization techniques is critical to their adoption in practice. This paper reports on an empirical study of a wide range of fault localization techniques on real-world faults. Different from previous studies, this paper (1) considers a wide range of techniques from different families, (2) combines different techniques, and (3) considers the execution time of different techniques. Our results reveal that a combined technique significantly outperforms any individual technique (200% increase in faults localized in Top 1), suggesting that combination may be a desirable way to apply fault localization techniques and that future techniques should also be evaluated in the combined setting. Our implementation is publicly available for evaluating and combining fault localization techniques.

Citations (183)

Summary

  • The paper empirically studies seven families of fault localization techniques and finds that combining them significantly outperforms individual methods, achieving a 200% increase in faults localized within the top 1 position.
  • Using the Defects4J dataset, the study shows weak correlations between techniques, implying their information sources are complementary and explaining the benefit of combined approaches.
  • The authors make their experimental infrastructure, CombineFL-core, publicly available, providing a valuable resource for future research in automated debugging.

Empirical Study on Fault Localization Techniques

This paper presents a comprehensive empirical paper of fault localization techniques, examining various methods with distinct attributes and their implications for automated debugging. Fault localization is a critical task in software engineering, where accurately identifying the causes of software defects can significantly enhance debugging and development efficiency.

The authors investigate seven families of fault localization techniques: Spectrum-Based Fault Localization (SBFL), Mutation-Based Fault Localization (MBFL), dynamic program slicing, stack trace analysis, predicate switching, information-retrieval-based fault localization (IR-based FL), and history-based fault localization. Each family utilizes different data sources, ranging from test coverage and program mutations to crash reports and development histories. The paper leverages the Defects4J dataset, consisting of 357 real-world software faults, to assess these techniques.

A notable aspect of this paper is the exploration of combining these techniques using a learning-to-rank model. The paper employs RankSVM to integrate techniques from diverse families, thus forming a new approach referred to as CombineFL. The findings indicate that CombinedFL significantly outperforms any standalone technique, achieving a 200% increase in faults localized within the top 1 position.

Key findings from the standalone evaluation suggest SBFL, employing Ochiai and DStar methods, ranks highest among individual techniques, especially when localizing faults in the top 10 list positions. The results illustrate the potential advantage of platform-agnostic techniques like stack trace analysis for specific fault types like crashes. The paper's rigorous correlation analysis reveals that most techniques are weakly correlated, implying that their information sources are complementary, which reinforces the rationale for combination approaches.

The time efficiency of each method is another critical dimension of this analysis. Techniques are categorized by their computational durations, ranging from quick methods like history-based localization to more time-intensive approaches such as MBFL. The optimal strategy varies based on available time resources, with the paper outlining efficient combinations for each time budget scenario.

By discussing the implications of these results, the paper sets a clear agenda for future fault localization research. It suggests that developing new techniques should focus on incorporating distinct information sources and that evaluations should consider a technique's contribution to combined approaches rather than its standalone performance.

Moreover, the paper makes its experimental infrastructure, CombineFL-core, publicly available, creating a valuable resource for future research in automated debugging. This release facilitates efficient evaluation and comparison, emphasizing the paper's contribution to the broader software engineering community.

In conclusion, this paper provides a substantive contribution to the field by empirically validating the benefits of combining fault localization techniques and offering actionable insights into their application. Future research might focus on exploring new combination strategies and integrating novel information sources to enhance fault localization performance further.