Aroma: Code Recommendation via Structural Code Search (1812.01158v4)

Published 4 Dec 2018 in cs.SE

Abstract: Programmers often write code that has similarity to existing code written somewhere. A tool that could help programmers to search such similar code would be immensely useful. Such a tool could help programmers to extend partially written code snippets to completely implement necessary functionality, help to discover extensions to the partial code which are commonly included by other programmers, help to cross-check against similar code written by other programmers, or help to add extra code which would fix common mistakes and errors. We propose Aroma, a tool and technique for code recommendation via structural code search. Aroma indexes a huge code corpus including thousands of open-source projects, takes a partial code snippet as input, searches the corpus for method bodies containing the partial code snippet, and clusters and intersects the results of the search to recommend a small set of succinct code snippets which both contain the query snippet and appear as part of several methods in the corpus. We evaluated Aroma on 2000 randomly selected queries created from the corpus, as well as 64 queries derived from code snippets obtained from Stack Overflow, a popular website for discussing code. We implemented Aroma for 4 different languages, and developed an IDE plugin for Aroma. Furthermore, we conducted a study where we asked 12 programmers to complete programming tasks using Aroma, and collected their feedback. Our results indicate that Aroma is capable of retrieving and recommending relevant code snippets efficiently.

Citations (151)

View on Semantic Scholar

Summary

The paper introduces Aroma, a tool that uses structural code search to provide context-aware code recommendations.
Aroma employs phased processing—featurization, lightweight search, pruning, and clustering—to match partial code snippets effectively.
Evaluations show Aroma delivers exact recommendations for 74% of queries with an average response time of 1.6 seconds.

Overview of Aroma: Code Recommendation via Structural Code Search

The paper "Aroma: Code Recommendation via Structural Code Search" introduces a code recommendation tool designed to assist programmers by leveraging structural code search. Aroma addresses the common scenario where code written by developers resembles existing code. By utilizing a vast corpus of open-source projects, Aroma enhances the development process by recommending code extensions based on partial snippets, error handling, and common practices.

Core Features and Methodology

Aroma operates by indexing code repositories and providing recommendations when supplied with partial code snippets. The tool consists of several phases, including light-weight search, pruning and reranking, followed by clustering and intersection. Each phase contributes to ensuring that the tool produces succinct and relevant code recommendations.

Featurization: The process begins with extracting structural features from code snippets using a simplified parse tree representation. This representation is generalized across different programming languages, enabling Aroma's applicability beyond just Java.
Light-weight Search: Aroma employs matrix multiplication with sparse vectors to efficiently compute overlap scores between the query and method bodies in the corpus. This phase facilitates rapid retrieval of top methods containing parts of the query snippet.
Prune and Rerank: Through a greedy heuristic for pruning, Aroma refines the search results, emphasizing snippets that best match the query. This phase is crucial for optimizing the similarity between the query and recommendations.
Cluster and Intersect: Utilizing a custom clustering algorithm, Aroma groups similar method bodies, intersecting these clusters to create recommendations. This ensures minimal redundancy and maximizes the insightfulness of the suggested snippets.

Evaluation and Results

The paper demonstrates Aroma's effectiveness on a corpus of Java methods collected from GitHub and evaluated against code snippets from Stack Overflow. Key findings show Aroma's capability to provide exact code recommendations for 74% of partial code queries derived from real-world examples. Furthermore, the tool can efficiently generate recommendations with an average response time of 1.6 seconds on a 24-core CPU.

A critical facet of Aroma’s methodology is the ability to synthesize non-trivial code recommendations that include configuration, error handling, or post-processing extensions. The paper provides examples where Aroma recommends usage patterns related to configuring objects or handling exceptions, thereby offering practical insights into established coding practices.

Comparison and Versatility

Aroma's capabilities are notably contrasted with those of existing code search and clone detection tools, such as SourcererCC and conventional TF-IDF-based search techniques. Aroma shows superior recall, particularly for non-contiguous code queries, due to its sophisticated featurization and pruning approach. The paper highlights the advantages of Aroma over pattern-based code completion tools like GraPacc by demonstrating its ability to offer on-the-fly code recommendations beyond pre-mined patterns.

Aroma’s deployment across multiple languages—Hack, JavaScript, and Python—further attests to its design philosophy of language-agnostic code recommendation. This broad application scope suggests considerable potential for integration into diverse development environments.

Implications and Future Work

Aroma's utility in providing contextually relevant code recommendations points towards a future where AI tools can significantly streamline software development. The tool’s architecture and adaptability hint at the possibility of further enhancements, potentially evolving into an autonomous assistant embedded in intelligent programming environments.

Future developments may include expanding Aroma's scope to encompass more complex languages and dynamic contexts. There is also potential for deep learning techniques to improve its clustering and intersection capabilities, thereby enhancing its efficacy and efficiency even further.

In conclusion, Aroma represents a significant step in code recommendation technology, leveraging structural search to offer practical, non-intrusive insights and aiding developers in the coding process. Its demonstrated success across diverse scenarios reflects both its robustness and the utility such tools can offer in modern software engineering.

PDF Markdown

Related Papers

Tweets

https://twitter.com/hd_nvim/status/1861828200122233125