Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Understanding the Impacts of Textual Dissimilarity on Duplicate Bug Report Detection (2212.09976v1)

Published 20 Dec 2022 in cs.SE

Abstract: About 40% of software bug reports are duplicates of one another, which pose a major overhead during software maintenance. Traditional techniques often focus on detecting duplicate bug reports that are textually similar. However, in bug tracking systems, many duplicate bug reports might not be textually similar, for which the traditional techniques might fall short. In this paper, we conduct a large-scale empirical study to better understand the impacts of textual dissimilarity on the detection of duplicate bug reports. First, we collect a total of 92,854 bug reports from three open-source systems and construct two datasets containing textually similar and textually dissimilar duplicate bug reports. Then we determine the performance of three existing techniques in detecting duplicate bug reports and show that their performance is significantly poor for textually dissimilar duplicate reports. Second, we analyze the two groups of bug reports using a combination of descriptive analysis, word embedding visualization, and manual analysis. We found that textually dissimilar duplicate bug reports often miss important components (e.g., expected behaviors and steps to reproduce), which could lead to their textual differences and poor performance by the existing techniques. Finally, we apply domain-specific embedding to duplicate bug report detection problems, which shows mixed results. All these findings above warrant further investigation and more effective solutions for detecting textually dissimilar duplicate bug reports.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Sigma Jahan (4 papers)
  2. Mohammad Masudur Rahman (44 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.