Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ReACC: A Retrieval-Augmented Code Completion Framework (2203.07722v1)

Published 15 Mar 2022 in cs.SE, cs.AI, and cs.CL

Abstract: Code completion, which aims to predict the following code token(s) according to the code context, can improve the productivity of software development. Recent work has proved that statistical LLMing with transformers can greatly improve the performance in the code completion task via learning from large-scale source code datasets. However, current approaches focus only on code context within the file or project, i.e. internal context. Our distinction is utilizing "external" context, inspired by human behaviors of copying from the related code snippets when writing code. Specifically, we propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We adopt a stage-wise training approach that combines a source code retriever and an auto-regressive LLM for programming language. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Shuai Lu (91 papers)
  2. Nan Duan (172 papers)
  3. Hojae Han (5 papers)
  4. Daya Guo (37 papers)
  5. Seung-won Hwang (59 papers)
  6. Alexey Svyatkovskiy (30 papers)
Citations (117)

Summary

We haven't generated a summary for this paper yet.