Papers
Topics
Authors
Recent
Search
2000 character limit reached

Source Code Retrieval Using Sequence Based Similarity

Published 16 Aug 2013 in cs.SE and cs.IR | (1308.3554v1)

Abstract: Duplicated code has a negative impact on the quality of software systems and should be detected at least. In this paper, we discuss an approach that improves source code retrieval using the structural information about the programs. We developed a lexical parser to extract control statements and method identifiers from Java programs. We propose a similarity measure that is defined by the ratio of the number of sequentially full matching statements to the number of sequentially partial matching ones. The similarity measure is considered to be an extension of a set based similarity index, e.g., Sorensen-Dice index. Our key contribution of this research is the development of a similarity retrieval algorithm that derives meaningful search conditions from a given sequence, and then performs retrieval using all of the derived conditions. Experiments show that our retrieval model outperforms the other retrieval models up to 90.9% in the number of retrieved methods.

Citations (10)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.