TRIAD: Automated Traceability Recovery based on Biterm-enhanced Deduction of Transitive Links among Artifacts (2312.16854v2)
Abstract: Traceability allows stakeholders to extract and comprehend the trace links among software artifacts introduced across the software life cycle, to provide significant support for software engineering tasks. Despite its proven benefits, software traceability is challenging to recover and maintain manually. Hence, plenty of approaches for automated traceability have been proposed. Most rely on textual similarities among software artifacts, such as those based on Information Retrieval (IR). However, artifacts in different abstraction levels usually have different textual descriptions, which can greatly hinder the performance of IR-based approaches (e.g., a requirement in natural language may have a small textual similarity to a Java class). In this work, we leverage the consensual biterms and transitive relationships (i.e., inner- and outer-transitive links) based on intermediate artifacts to improve IR-based traceability recovery. We first extract and filter biterms from all source, intermediate, and target artifacts. We then use the consensual biterms from the intermediate artifacts to extend the biterms of both source and target artifacts, and finally deduce outer and inner-transitive links to adjust text similarities between source and target artifacts. We conducted a comprehensive empirical evaluation based on five systems widely used in other literature to show that our approach can outperform four state-of-the-art approaches, and how its performance is affected by different conditions of source, intermediate, and target artifacts. The results indicate that our approach can outperform baseline approaches in AP over 15% and MAP over 10% on average.
- 2023a. Center of Excellence for Software and Systems Traceability. http://www.coest.org/.
- 2023b. CoEST community datasets. http://sarec.nd.edu/coest/datasets.html.
- 2023. Comet Data Replication Package: LibEST. https://gitlab.com/SEMERU-Code-Public/Data/icse20-comet-data-replication-package/-/tree/main/LibEST.
- 2023. Dronology Datasets. https://dronology.info/datasets/.
- 2023. srcML. https://www.srcml.org/.
- 2023a. TRIAD code. https://github.com/huiAlex/TRIAD.
- 2023b. TRIAD dataset. https://doi.org/10.5281/zenodo.10430771.
- A Traceability Technique for Specifications. In 16th IEEE International Conference on Program Comprehension. IEEE, 103–112.
- Exploiting Parts-of-Speech for effective automated requirements traceability. Inf. Softw. Technol. 106 (2019), 126–141. https://doi.org/10.1016/j.infsof.2018.09.009
- Recovering Traceability Links between Code and Documentation. IEEE Trans. Software Eng. 28, 10 (2002), 970–983.
- Ricardo Baezayates and Berthier Ribeironeto. 2011. Modern information retrieval. Addison-Wesley Publishing CompanyUnited States.
- Robert Bassett and Julio Deride. 2019. Maximum a posteriori estimators as a limit of Bayes estimators. Math. Program. 174, 1-2 (2019), 129–144. https://doi.org/10.1007/S10107-018-1241-0
- The Concept Assignment Problem in Program Understanding. In 15th International Conference on Software Engineering, Victor R. Basili, Richard A. DeMillo, and Takuya Katayama (Eds.). IEEE/ACM, 482–498.
- BTM: Topic Modeling over Short Texts. IEEE Transactions on Knowledge and Data Engineering 26, 12 (2014), 2928–2941. https://doi.org/10.1109/TKDE.2014.2313872
- Elliot J. Chikofsky and James H. Cross II. 1990. Reverse Engineering and Design Recovery: A Taxonomy. IEEE Softw. 7, 1 (1990), 13–17. https://doi.org/10.1109/52.43044
- Software traceability: trends and future directions. In Future of Software Engineering. ACM, 55–69.
- Utilizing Supporting Evidence to Improve Dynamic Requirements Traceability. In 13th IEEE International Conference on Requirements Engineering. IEEE, 135–144.
- Dronology: an incubator for cyber-physical systems research. In 40th International Conference on Software Engineering. ACM, 109–112. https://doi.org/10.1145/3183399.3183408
- Information Retrieval Methods for Automated Traceability Recovery. In Software and Systems Traceability, Jane Cleland-Huang, Olly Gotel, and Andrea Zisman (Eds.). Springer, 71–98.
- Incremental Approach and User Feedbacks: a Silver Bullet for Traceability Recovery. In 22nd IEEE International Conference on Software Maintenance. IEEE, 299–309.
- Improving IR-based Traceability Recovery Using Smoothing Filters. In 19th IEEE International Conference on Program Comprehension. IEEE, 21–30.
- Using code ownership to improve IR-based Traceability Link Recovery. In 21st IEEE International Conference on Program Comprehension. IEEE Computer Society, 123–132. https://doi.org/10.1109/ICPC.2013.6613840
- Integrating information retrieval, execution and link analysis algorithms to improve feature location in software. Empirical Software Engineering 18, 2 (2013), 277–309.
- Semi-supervised pre-processing for learning-based traceability framework on real-world software projects. In 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 570–582. https://doi.org/10.1145/3540250.3549151
- Effort and Quality of Recovering Requirements-to-Code Traces: Two Exploratory Experiments. In 18th IEEE International Requirements Engineering Conference. IEEE, 221–230.
- Leveraging Historical Associations between Requirements and Source Code to Identify Impacted Classes. IEEE Trans. Software Eng. 46, 4 (2020), 420–441. https://doi.org/10.1109/TSE.2018.2861735
- Propagating frugal user feedback through closeness of code dependencies to improve IR-based traceability recovery. Empir. Softw. Eng. 27, 2 (2022), 41. https://doi.org/10.1007/s10664-021-10091-5
- Using Consensual Biterms from Text Structures of Requirements and Code to Improve IR-Based Traceability Recovery. In 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22). ACM, Article 114. https://doi.org/10.1145/3551349.3556948
- On integrating orthogonal information retrieval methods to improve traceability recovery. In IEEE 27th International Conference on Software Maintenance. IEEE Computer Society, 133–142.
- Inconsistency management for multiple-view software development environments. IEEE Transactions on Software Engineering 24, 11 (1998), 960–981. https://doi.org/10.1109/32.730545
- Semantically enhanced software traceability using deep learning techniques. In 39th International Conference on Software Engineering, Sebastián Uchitel, Alessandro Orso, and Martin P. Robillard (Eds.). IEEE/ACM, 3–14.
- Mohammad Abdul Hadi and Fatemeh H Fard. 2020. AOBTM: Adaptive Online Biterm Topic Modeling for Version Sensitive Short-texts Analysis. In IEEE International Conference on Software Maintenance and Evolution. 593–604. https://doi.org/10.1109/ICSME46990.2020.00062
- Advancing Candidate Link Generation for Requirements Tracing: The Study of Methods. IEEE Trans. Software Eng. 32, 1 (2006), 4–19.
- Improving Traceability Link Recovery Using Fine-grained Requirements-to-Code Relations. In IEEE International Conference on Software Maintenance and Evolution, ICSME 2021, Luxembourg, September 27 - October 1, 2021. IEEE, 12–22. https://doi.org/10.1109/ICSME52107.2021.00008
- Matthew D. Hoffman and Andrew Gelman. 2014. The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1 (2014), 1593–1623. https://doi.org/10.5555/2627435.2638586
- Einar W. Høst and Bjarte M. Østvold. 2009. Debugging Method Names. In 23rd European Conference on Object-Oriented Programming (LNCS, Vol. 5653), Sophia Drossopoulou (Ed.). Springer, 294–317. https://doi.org/10.1007/978-3-642-03013-0_14
- I. Ivkovic and K. Kontogiannis. 2004. Tracing evolution changes of software artifacts through model synchronization. In 20th IEEE International Conference on Software Maintenance. 252–261. https://doi.org/10.1109/ICSM.2004.1357809
- Using frugal user feedback with closeness analysis on code to improve IR-based traceability recovery. In 27th International Conference on Program Comprehension, Yann-Gaël Guéhéneuc, Foutse Khomh, and Federica Sarro (Eds.). IEEE/ACM, 369–379.
- Analyzing closeness of code dependencies for improving IR-based Traceability Recovery. In 24th IEEE International Conference on Software Analysis, Evolution and Reengineering, Martin Pinzger, Gabriele Bavota, and Andrian Marcus (Eds.). IEEE, 68–78.
- RCLinker: automated linking of issue reports and commits leveraging rich contextual information. In 23rd IEEE International Conference on Program Comprehension. IEEE, 36–47. https://doi.org/10.1109/ICPC.2015.13
- Traceability Transformed: Generating more Accurate Links with Pre-Trained BERT Models. In 43rd IEEE/ACM International Conference on Software Engineering. IEEE, 324–335. https://doi.org/10.1109/ICSE43902.2021.00040
- Information Retrieval Methods for Automated Traceability Recovery. In Software and Systems Traceability, Jane Cleland-Huang, Olly Gotel, and Andrea Zisman (Eds.). Springer, 71–98. https://doi.org/10.1007/978-1-4471-2239-5_4
- Cliff’s Delta Calculator: A non-parametric effect size program for two groups of observations. Universitas Psychologica 10 (2011), 545–555.
- Patrick Mäder and Alexander Egyed. 2015. Do developers benefit from requirements traceability when evolving and maintaining a software system? Empir. Softw. Eng. 20, 2 (2015), 413–441. https://doi.org/10.1007/s10664-014-9314-z
- Strategic Traceability for Safety-Critical Projects. IEEE Softw. 30, 3 (2013), 58–66. https://doi.org/10.1109/MS.2013.60
- Jonathan I. Maletic and Michael L. Collard. 2015. Exploration, Analysis, and Manipulation of Source Code Using srcML. In 37th IEEE/ACM International Conference on Software Engineering, Vol. 2. 951–952. https://doi.org/10.1109/ICSE.2015.302
- The Stanford CoreNLP Natural Language Processing Toolkit. In 52nd Annual Meeting of the Association for Computational Linguistics. ACL, 55–60. https://doi.org/10.3115/v1/p14-5010
- Andrian Marcus and Jonathan I. Maletic. 2003. Recovering Documentation-to-Source-Code Traceability Links using Latent Semantic Indexing. In 25th International Conference on Software Engineering, Lori A. Clarke, Laurie Dillon, and Walter F. Tichy (Eds.). IEEE, 125–137.
- Supporting Quality Assurance with Automated Process-Centric Quality Constraints Checking. In 43rd IEEE/ACM International Conference on Software Engineering. IEEE, 1298–1310. https://doi.org/10.1109/ICSE43902.2021.00118
- Combining textual and structural analysis of software artifacts for traceability link recovery. In ICSE Workshop on Traceability in Emerging Forms of Software Engineering, Giuliano Antoniol, Denys Poshyvanyk, and Rocco Oliveto (Eds.). IEEE, 41–48.
- Tracing with Less Data: Active Learning for Classification-Based Traceability Link Recovery. In 2019 IEEE International Conference on Software Maintenance and Evolution. IEEE, 103–113. https://doi.org/10.1109/ICSME.2019.00020
- Automatic Traceability Maintenance via Machine Learning Classification. In 2018 IEEE International Conference on Software Maintenance and Evolution. IEEE, 369–380. https://doi.org/10.1109/ICSME.2018.00045
- Improving the effectiveness of traceability link recovery using hierarchical bayesian networks. In 42nd International Conference on Software Engineering, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 873–885. https://doi.org/10.1145/3377811.3380418
- A SysML-based approach to traceability management and design slicing in support of safety certification: Framework, tool support, and case studies. Inf. Softw. Technol. 54, 6 (2012), 569–590. https://doi.org/10.1016/j.infsof.2012.01.005
- On the generation, structure, and semantics of grammar patterns in source code identifiers. J. Syst. Softw. 170 (2020), 110740. https://doi.org/10.1016/j.jss.2020.110740
- Managing Security Control Assumptions Using Causal Traceability. In 8th IEEE/ACM International Symposium on Software and Systems Traceability, Patrick Mäder and Rocco Oliveto (Eds.). IEEE Computer Society, 43–49. https://doi.org/10.1109/SST.2015.14
- Recovering transitive traceability links among software artifacts. In IEEE International Conference on Software Maintenance and Evolution. IEEE, 576–580. https://doi.org/10.1109/ICSM.2015.7332517
- How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms. In 35th International Conference on Software Engineering. 522–531. https://doi.org/10.1109/ICSE.2013.6606598
- Adaptive User Feedback for IR-Based Traceability Recovery. In 8th IEEE/ACM International Symposium on Software and Systems Traceability, Patrick Mäder and Rocco Oliveto (Eds.). IEEE, 15–21.
- When and How Using Structural Information to Improve IR-Based Traceability Recovery. In 17th European Conference on Software Maintenance and Reengineering, Anthony Cleve, Filippo Ricca, and Maura Cerioli (Eds.). IEEE, 199–208.
- Martin F. Porter. 1980. An algorithm for suffix stripping. Program 14, 3 (1980), 130–137. https://doi.org/10.1108/eb046814
- Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval. IEEE Trans. Software Eng. 33, 6 (2007), 420–432.
- Reactive Links across Multi-Domain Engineering Models. In 25th International Conference on Model Driven Engineering Languages and Systems. ACM, 76–86. https://doi.org/10.1145/3550355.3552446
- Balasubramaniam Ramesh and Matthias Jarke. 2001. Toward Reference Models of Requirements Traceability. IEEE Trans. Software Eng. 27, 1 (2001), 58–93. https://doi.org/10.1109/32.895989
- Analyzing requirements and traceability information to improve bug localization. In 15th International Conference on Mining Software Repositories, Andy Zaidman, Yasutaka Kamei, and Emily Hill (Eds.). ACM, 442–453. https://doi.org/10.1145/3196398.3196415
- Traceability in the Wild: Automatically Augmenting Incomplete Trace Links. In Software Engineering and Software Management (LNI, Vol. P-292). GI, 63.
- Patrick Rempel and Patrick Mäder. 2017. Preventing Defects: The Impact of Requirements Traceability Completeness on Software Quality. IEEE Trans. Software Eng. 43, 8 (2017), 777–797.
- Leveraging Intermediate Artifacts to Improve Automated Trace Link Retrieval. In IEEE International Conference on Software Maintenance and Evolution. IEEE, 81–92. https://doi.org/10.1109/ICSME52107.2021.00014
- Munirathnam Srikanth and Rohini Srihari. 2002. Biterm Language Models for Document Retrieval. In 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 425–426. https://doi.org/10.1145/564376.564476
- FRLink: Improving the recovery of missing issue-commit links by revisiting file relevance. Inf. Softw. Technol. 84 (2017), 33–47. https://doi.org/10.1016/j.infsof.2016.11.010
- Frank Wilcoxon. 1944. Individual Comparisons by Ranking Methods. Biom Bull. Biometrics 1, 6 (1944), 80–83.