Towards Better Graph Neural Network-based Fault Localization Through Enhanced Code Representation (2404.04496v6)
Abstract: Automatic software fault localization plays an important role in software quality assurance by pinpointing faulty locations for easier debugging. Coverage-based fault localization, a widely used technique, employs statistics on coverage spectra to rank code based on suspiciousness scores. However, the rigidity of statistical approaches calls for learning-based techniques. Amongst all, Grace, a graph-neural network (GNN) based technique has achieved state-of-the-art due to its capacity to preserve coverage spectra, i.e., test-to-source coverage relationships, as precise abstract syntax-enhanced graph representation, mitigating the limitation of other learning-based technique which compresses the feature representation. However, such representation struggles with scalability due to the increasing complexity of software and associated coverage spectra and AST graphs. In this work, we proposed a new graph representation, DepGraph, that reduces the complexity of the graph representation by 70% in nodes and edges by integrating interprocedural call graph in the graph representation of the code. Moreover, we integrate additional features such as code change information in the graph as attributes so the model can leverage rich historical project data. We evaluate DepGraph using Defects4j 2.0.0, and it outperforms Grace by locating 20% more faults in Top-1 and improving the Mean First Rank (MFR) and the Mean Average Rank (MAR) by over 50% while decreasing GPU memory usage by 44% and training/inference time by 85%. Additionally, in cross-project settings, DepGraph surpasses the state-of-the-art baseline with a 42% higher Top-1 accuracy, and 68% and 65% improvement in MFR and MAR, respectively. Our study demonstrates DepGraph's robustness, achieving state-of-the-art accuracy and scalability for future extension and adoption.
- An evaluation of similarity coefficients for software fault localization. In 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC’06). IEEE, 39–46.
- Spectrum-based multiple fault localization. In 2009 IEEE/ACM International Conference on Automated Software Engineering. IEEE, 88–99.
- Understanding of a convolutional neural network. In 2017 international conference on engineering and technology (ICET). Ieee, 1–6.
- AnonymousSubmission9. 2023. Replication package and data. https://github.com/anonymoussubmission9/anonymous-submission.git GitHub repository.
- A learning-to-rank based fault localization approach using likely invariants. In Proceedings of the 25th international symposium on software testing and analysis. 177–188.
- Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
- On the effectiveness of unified debugging: An extensive study on 16 program repair systems. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 907–918.
- Gzoltar: an eclipse plug-in for testing and debugging. In Proceedings of the 27th IEEE/ACM international conference on automated software engineering. 378–381.
- How Useful is Code Change Information for Fault Localization in Continuous Integration?. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12.
- Improving software fault localization by combining spectrum and mutation. IEEE Access 8 (2020), 172296–172307.
- Rahul Dey and Fathi M Salem. 2017. Gate-variants of gated recurrent unit (GRU) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS). IEEE, 1597–1600.
- Arpita Dutta and Sangharatna Godboley. 2021. Msfl: A model for fault localization using mutation-spectra technique. In Lean and Agile Software Development: 5th International Conference, LASD 2021, Virtual Event, January 23, 2021, Proceedings 5. Springer, 156–173.
- Georgios Gousios. 2023. java-callgraph: A simple callgraph generator tool for Java. https://github.com/gousiosg/java-callgraph/tree/master GitHub repository.
- Alex Graves and Alex Graves. 2012. Long short-term memory. Supervised sequence labelling with recurrent neural networks (2012), 37–45.
- C. Hait and G. Tassey. 2002. The Economic Impacts of Inadequate Infrastructure for Software Testing. DIANE Publishing Company.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
- Visualization of test information to assist fault localization. In Proceedings of the 24th international conference on Software engineering. 467–477.
- Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 international symposium on software testing and analysis. 437–440.
- Practitioners’ expectations on automated fault localization. In Proceedings of the 25th international symposium on software testing and analysis. 165–176.
- Information retrieval and spectrum based bug localization: Better together. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 579–590.
- Theory and practice, do they match? a case with spectrum-based fault localization. In 2013 IEEE International Conference on Software Maintenance. IEEE, 380–383.
- Deepfl: Integrating multiple fault diagnosis dimensions for deep fault localization. In Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis. 169–180.
- Xia Li and Lingming Zhang. 2017. Transforming programs and tests in tandem for fault localization. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 1–30.
- Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).
- Fault localization with code coverage representation learning. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 661–673.
- DeepLV: Suggesting log levels using ordinal based neural networks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1461–1472.
- Tell: log level suggestions via modeling multi-level code block information. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 27–38.
- Can automated program repair refine fault localization? a unified debugging approach. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 75–87.
- Boosting coverage-based fault localization via graph-based representation learning. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 664–676.
- Graph Neural Networks: Scalability. Graph Neural Networks: Foundations, Frontiers, and Applications (2022), 99–119.
- Ask the mutants: Mutating faulty programs for fault localization. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation. IEEE, 153–162.
- Nachiappan Nagappan and Thomas Ball. 2005. Use of relative code churn measures to predict system defect density. In Proceedings of the 27th international conference on Software engineering. 284–292.
- FFL: Fine-grained Fault Localization for Student Programs via Syntactic and Semantic Reasoning. In 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 151–162.
- Using of Jaccard coefficient for keywords similarity. In Proceedings of the international multiconference of engineers and computer scientists, Vol. 1. 380–384.
- Mike Papadakis and Yves Le Traon. 2015. Metallaxis-FL: mutation-based fault localization. Software Testing, Verification and Reliability 25, 5-7 (2015), 605–628.
- Chris Parnin and Alessandro Orso. 2011. Are automated debugging techniques actually helping programmers?. In Proceedings of the 2011 international symposium on software testing and analysis. 199–209.
- GNet4FL: effective fault localization via graph convolutional neural network. Automated Software Engineering 30, 2 (2023), 16.
- AGFL: a graph convolutional neural network-based method for fault localization. In 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS). IEEE, 672–680.
- Jeongju Sohn and Shin Yoo. 2017. Fluccs: Using code and change metrics to improve fault localization. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. 273–283.
- PyTorch Team. 2023. PyTorch. https://pytorch.org/
- Chris Thunes. 2023. javalang: Pure Python Java parser and tools. https://github.com/c2nes/javalang GitHub repository.
- Call frequency-based fault localization. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 365–376.
- Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 261–271.
- Historical spectrum based fault localization. IEEE Transactions on Software Engineering 47, 11 (2019), 2348–2368.
- The DStar method for effective software fault localization. IEEE Transactions on Reliability 63, 1 (2013), 290–308.
- Effective software fault localization using an RBF neural network. IEEE Transactions on Reliability 61, 1 (2011), 149–169.
- A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707–740.
- W Eric Wong and Yu Qi. 2009. BP neural network-based effective fault localization. International Journal of Software Engineering and Knowledge Engineering 19, 04 (2009), 573–597.
- GMBFL: Optimizing Mutation-Based Fault Localization via Graph Representation. In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 245–257.
- A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems 32, 1 (2020), 4–24.
- A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM Transactions on software engineering and methodology (TOSEM) 22, 4 (2013), 1–40.
- Revisit of automatic debugging via human focus-tracking analysis. In Proceedings of the 38th International Conference on Software Engineering. 808–819.
- Defect prediction with semantics and context features of codes based on graph representation learning. IEEE Transactions on Reliability 70, 2 (2020), 613–625.
- Every mutation should be rewarded: Boosting fault localization with mutated predicates. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 196–207.
- Boosting spectrum-based fault localization using pagerank. In Proceedings of the 26th ACM SIGSOFT international symposium on software testing and analysis. 261–272.
- An empirical study of boosting spectrum-based fault localization via pagerank. IEEE Transactions on Software Engineering 47, 6 (2019), 1089–1113.
- CNN-FL: An effective approach for localizing faults using convolutional neural networks. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 445–455.
- Fault localization analysis based on deep neural network. Mathematical Problems in Engineering 2016 (2016).
- Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In 2012 34th International conference on software engineering (ICSE). IEEE, 14–24.
- Gnnear: Accelerating full-batch training of graph neural networks with near-memory processing. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 54–68.
- Predicting defects for eclipse. In Third International Workshop on Predictor Models in Software Engineering (PROMISE’07: ICSE Workshops 2007). IEEE, 9–9.
- An empirical study of fault localization families and their combinations. IEEE Transactions on Software Engineering 47, 2 (2019), 332–347.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.