Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
55 tokens/sec
2000 character limit reached

RAG or Fine-tuning? A Comparative Study on LCMs-based Code Completion in Industry (2505.15179v1)

Published 21 May 2025 in cs.SE

Abstract: Code completion, a crucial practice in industrial settings, helps developers improve programming efficiency by automatically suggesting code snippets during development. With the emergence of Large Code Models (LCMs), this field has witnessed significant advancements. Due to the natural differences between open-source and industrial codebases, such as coding patterns and unique internal dependencies, it is a common practice for developers to conduct domain adaptation when adopting LCMs in industry. There exist multiple adaptation approaches, among which retrieval-augmented generation (RAG) and fine-tuning are the two most popular paradigms. However, no prior research has explored the trade-off of the two approaches in industrial scenarios. To mitigate the gap, we comprehensively compare the two paradigms including Retrieval-Augmented Generation (RAG) and Fine-tuning (FT), for industrial code completion in this paper. In collaboration with Tencent's WXG department, we collect over 160,000 internal C++ files as our codebase. We then compare the two types of adaptation approaches from three dimensions that are concerned by industrial practitioners, including effectiveness, efficiency, and parameter sensitivity, using six LCMs. Our findings reveal that RAG, when implemented with appropriate embedding models that map code snippets into dense vector representations, can achieve higher accuracy than fine-tuning alone. Specifically, BM25 presents superior retrieval effectiveness and efficiency among studied RAG methods. Moreover, RAG and fine-tuning are orthogonal and their combination leads to further improvement. We also observe that RAG demonstrates better scalability than FT, showing more sustained performance gains with larger scales of codebase.

Summary

A Comparative Study on Industrial Code Completion Using Large Code Models

The paper "RAG or Fine-tuning? A Comparative Study on LCMs-based Code Completion in Industry" provides a rigorous examination of two primary methodologies—Retrieval-Augmented Generation (RAG) and fine-tuning—used to enhance code completion capabilities within industrial settings utilizing Large Code Models (LCMs). Authored by a team from the Chinese University of Hong Kong and Tencent, the research is conducted on a substantial dataset from Tencent's WXG department, which comprises over 160,000 internal C++ files. This paper addresses the particular challenges posed by the complex and proprietary nature of industrial codebases, distinguishing significantly from more generalized open-source models.

Research Overview

Code completion is an integral component of modern Integrated Development Environments (IDEs), significantly advancing developer productivity by predicting subsequent code segments. The deployment of LCMs in industrial scenarios faces hurdles due to disparities in coding styles and dependencies between public and proprietary codebases, thus necessitating domain-specific adaptation strategies. Among these, RAG and fine-tuning represent prominent approaches, each offering unique advantages and challenges that have not been thoroughly investigated in industrial contexts.

The authors assess these adaptation methods across several dimensions—effectiveness, efficiency, and parameter sensitivity—using six different LCMs, including DeepSeek-Coder and Qwen2.5-Coder, with model sizes ranging from 0.5 billion to 7 billion parameters. The paper applies both similarity-based and dependency-based retrieval within RAG frameworks and explores strategic fine-tuning of LCMs on a comprehensive industrial dataset.

Key Findings

Performance Improvements: Both RAG and fine-tuning significantly enhance code completion performance; however, RAG, specifically with the BM25 similarity-based retrieval, demonstrates superior gains in accuracy and efficiency. Fine-tuning alone leads to substantial improvements but does not match the performance ceiling achieved by combined RAG approaches.

Complementary Benefits: The synergistic interplay between RAG and fine-tuning methods shows improved outcomes over singular use. The combination of these techniques manages to harness the strengths of retrieved context alongside domain-specific tuning, yielding higher exact matches and BLEU scores.

Efficiency Trade-offs: RAG incurs minimal preparation costs but introduces runtime overhead due to context retrieval processes. In contrast, fine-tuning demands extensive computational resources during training, highlighting the need for balanced resource distribution considerations within industrial deployments.

Scalability: While fine-tuning shows diminishing returns beyond a certain codebase size, RAG methods continue to extract meaningful context from larger datasets, indicating better scalability for broader codebase applications.

Implications and Future Directions

The research provides actionable insights for both developers and researchers. For practitioners, adapting LCMs through RAG offers efficient domain alignment, particularly in resource-rich environments. RAG emerges as preferred in data-rich scenarios, presenting superior scalability with expansive codebases. Additionally, the paper suggests further exploration into snippet-level retrieval and advanced code embedding techniques to mitigate runtime burdens.

For researchers, the paper highlights several avenues for advancement, such as developing more robust fine-tuning techniques that maintain generalization across diverse code intelligence tasks, refining long-context inference approaches, and optimizing lexical-based retrieval mechanisms.

Overall, the paper offers a significant contribution to understanding the adaptation of LCMs within proprietary industrial settings, paving the way for enhanced AI-driven coding tools that are specifically tailored to complex and evolving business needs. This comparative evaluation serves as a foundational reference for subsequent advancements in AI-powered software development tools.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.