Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automatic Traceability Maintenance via Machine Learning Classification (1807.06684v1)

Published 17 Jul 2018 in cs.SE

Abstract: Previous studies have shown that software traceability, the ability to link together related artifacts from different sources within a project (e.g., source code, use cases, documentation, etc.), improves project outcomes by assisting developers and other stakeholders with common tasks such as impact analysis, concept location, etc. Establishing traceability links in a software system is an important and costly task, but only half the struggle. As the project undergoes maintenance and evolution, new artifacts are added and existing ones are changed, resulting in outdated traceability information. Therefore, specific steps need to be taken to make sure that traceability links are maintained in tandem with the rest of the project. In this paper we address this problem and propose a novel approach called TRAIL for maintaining traceability information in a system. The novelty of TRAIL stands in the fact that it leverages previously captured knowledge about project traceability to train a machine learning classifier which can then be used to derive new traceability links and update existing ones. We evaluated TRAIL on 11 commonly used traceability datasets from six software systems and compared it to seven popular information Retrieval (IR) techniques including the most common approaches used in previous work. The results indicate that TRAIL outperforms all IR approaches in terms of precision, recall, and F-score.

Citations (42)

Summary

  • The paper introduces TRAIL, a machine learning classifier that maintains evolving software traceability links accurately.
  • It utilizes historical traceability knowledge to generate and update links, reducing the manual maintenance burden.
  • Evaluations on 11 datasets show that TRAIL outperforms traditional IR methods in precision, recall, and F-score.

The paper "Automatic Traceability Maintenance via Machine Learning Classification" introduces an innovative approach to maintaining software traceability, a critical aspect of software engineering focused on linking related artifacts in a project. Such artifacts include source code, use cases, documentation, and more. Properly maintained traceability can significantly improve project outcomes through facilitated tasks such as impact analysis and concept location.

A core challenge in software traceability is not just creating the initial traceability links but also maintaining them as the project evolves. New artifacts continuously emerge, and existing ones may undergo alterations, causing the traceability information to become outdated. Addressing this challenge, the authors propose a novel method called TRAIL.

TRAIL: A Machine Learning-based Approach

TRAIL stands apart from traditional methods by utilizing machine learning to enhance the maintenance of traceability links. The approach involves leveraging previously established traceability knowledge to train a classifier. This classifier is then employed to generate new traceability links and update existing ones, thus ensuring that the traceability information remains current as the project grows and changes.

Evaluation and Comparison

The strength of TRAIL was evaluated using 11 traceability datasets from six different software systems. The authors meticulously compared TRAIL's performance against seven well-known Information Retrieval (IR) techniques, which represent the standard methods used in previous studies.

The results of this comprehensive evaluation were noteworthy. TRAIL consistently outperformed all the IR approaches in key performance metrics:

  • Precision: Accuracy of the traceability links proposed by TRAIL.
  • Recall: The ability of TRAIL to identify all relevant traceability links.
  • F-score: The harmonic mean of precision and recall, providing a balanced measure of the classifier's effectiveness.

Conclusion

In summary, this paper showcases TRAIL as a superior method for maintaining software traceability through the application of machine learning. By automatically updating traceability links with high precision, recall, and F-score, TRAIL reduces the significant effort traditionally required to keep traceability information up-to-date in evolving software projects. The promising results suggest that TRAIL can be a valuable tool for software developers, improving the overall maintenance and evolution processes in software engineering.