Semantically Enhanced Software Traceability Using Deep Learning Techniques (1804.02438v1)

Published 6 Apr 2018 in cs.SE

Abstract: In most safety-critical domains the need for traceability is prescribed by certifying bodies. Trace links are generally created among requirements, design, source code, test cases and other artifacts, however, creating such links manually is time consuming and error prone. Automated solutions use information retrieval and machine learning techniques to generate trace links, however, current techniques fail to understand semantics of the software artifacts or to integrate domain knowledge into the tracing process and therefore tend to deliver imprecise and inaccurate results. In this paper, we present a solution that uses deep learning to incorporate requirements artifact semantics and domain knowledge into the tracing solution. We propose a tracing network architecture that utilizes Word Embedding and Recurrent Neural Network (RNN) models to generate trace links. Word embedding learns word vectors that represent knowledge of the domain corpus and RNN uses these word vectors to learn the sentence semantics of requirements artifacts. We trained 360 different configurations of the tracing network using existing trace links in the Positive Train Control domain and identified the Bidirectional Gated Recurrent Unit (BI-GRU) as the best model for the tracing task. BI-GRU significantly out-performed state-of-the-art tracing methods including the Vector Space Model and Latent Semantic Indexing.

Citations (223)

View on Semantic Scholar

Summary

The paper introduces a deep learning approach that embeds semantic context using a BI-GRU model, achieving a 41% MAP improvement over traditional IR methods.
It leverages 360 network configurations with word embeddings and RNN models to identify BI-GRU as the most effective architecture for safety-critical datasets.
The study highlights that expanding training with validated trace links and integrating hybrid neural-ontology systems could further enhance precision and recall.

Review of "Semantically Enhanced Software Traceability Using Deep Learning Techniques"

The paper "Semantically Enhanced Software Traceability Using Deep Learning Techniques" tackles a significant challenge in the domain of software engineering, specifically within safety-critical systems such as Positive Train Control (PTC). The authors address the inadequacy of traditional requirements traceability approaches, which often suffer from term mismatch due to their inability to grasp the semantic context of software artifacts.

Study Summary

The main contribution of the paper is the introduction of a novel approach that leverages deep learning to enhance traceability in software systems by incorporating semantic understanding and domain knowledge. The authors propose a tracing network architecture utilizing Word Embedding and Recurrent Neural Network (RNN) models. Their method diverges from conventional Information Retrieval (IR) techniques by embedding semantics into the traceability process, allowing for a more nuanced detection of relevant links between software artifacts.

By training 360 configurations of their tracing network, the research identifies the Bidirectional Gated Recurrent Unit (BI-GRU) as the most effective model. This model significantly surpassed traditional traceability techniques like the Vector Space Model (VSM) and Latent Semantic Indexing (LSI) on the PTC dataset.

Key Results and Implications

The paper presents robust numerical results demonstrating that the BI-GRU model achieves higher Mean Average Precision (MAP) compared to established baseline techniques. The tracing network and its configurations, particularly BI-GRU, achieved a substantial MAP increase of 41% over VSM and 32% over LSI, validating the effectiveness of integrating semantic understanding into the trace task.

The performance improvement is particularly notable at high levels of recall, thus proving crucial in safety-critical contexts where near-perfect recall is often required. The paper suggests that by training the network with a larger set of validated trace links, further advancements in precision and recall can be realized.

Speculation on Future Developments

The integration of deep learning into software traceability has the potential to extend beyond the current safety-critical context. With further advancements, this approach might be adapted to broader industrial applications, addressing limitations across different software engineering domains where context and semantic understanding are crucial.

Future research could explore hybrid systems combining neural networks with domain-specific ontologies or knowledge bases to further enhance performance. Additionally, the exploration of real-time, adaptive learning systems that refine traceability links as projects evolve presents an exciting avenue.

In conclusion, this paper presents a significant step forward in automating software traceability with semantics-aware models, highlighting the benefit of crossing traditional boundaries of IR methods with cutting-edge AI techniques to tackle longstanding challenges in software engineering traceability.

PDF Markdown