Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond (2403.14734v4)

Published 21 Mar 2024 in cs.SE, cs.AI, cs.CL, and cs.PL

Abstract: Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronological review of the advancements in code intelligence, encompassing over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works. We follow the historical progression to trace the paradigm shifts across different research phases (e.g., from modeling code with recurrent neural networks to the era of LLMs). Concurrently, we highlight the major technical transitions in models, tasks, and evaluations spanning through different stages. For applications, we also observe a co-evolving shift. It spans from initial endeavors to tackling specific scenarios, through exploring a diverse array of tasks during its rapid expansion, to currently focusing on tackling increasingly complex and varied real-world challenges. Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence, uncovering new cross-domain opportunities and illustrating the substantial influence of code intelligence across various domains. Finally, we delve into both the opportunities and challenges associated with this field, alongside elucidating our insights on the most promising research directions. An ongoing, dynamically updated project and resources associated with this survey have been released at https://github.com/QiushiSun/Awesome-Code-Intelligence.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (18)
  1. Qiushi Sun (26 papers)
  2. Zhirui Chen (11 papers)
  3. Fangzhi Xu (22 papers)
  4. Kanzhi Cheng (14 papers)
  5. Chang Ma (20 papers)
  6. Zhangyue Yin (27 papers)
  7. Jianing Wang (50 papers)
  8. Chengcheng Han (83 papers)
  9. Renyu Zhu (17 papers)
  10. Shuai Yuan (68 papers)
  11. Qipeng Guo (72 papers)
  12. Xipeng Qiu (257 papers)
  13. Pengcheng Yin (42 papers)
  14. Xiaoli Li (120 papers)
  15. Fei Yuan (28 papers)
  16. Lingpeng Kong (134 papers)
  17. Xiang Li (1003 papers)
  18. Zhiyong Wu (171 papers)

Summary

A Survey of Neural Code Intelligence: Evolution, Current Paradigms, and Future Directions

Introduction

Neural Code Intelligence (NCI) encompasses methodologies leveraging deep learning techniques to comprehend, generate, and optimize source code. Having emerged from the confluence of advancements in deep learning and access to extensive code data, NCI has quickly risen as an area of significant research interest and practical relevance. This survey delivers a meticulous review of the evolution of NCI, underscoring the progression from initial models primarily relying on recurrent neural networks to the present-day dominance of LLMs for code.

Preliminaries on Code Embeddings and Tasks

Early endeavors in NCI focused on embedding code snippets as vectors, paving the way for neural networks’ application in code-related tasks. This section revisits classical methods for processing code through neural LLMs, elaborating on strategies for integrating structural information and providing an overview of quintessential code-related tasks.

Evolution of Code Pre-trained Models

Following the success of transformer-based pre-trained LLMs in NLP, a similar paradigm shift in code intelligence saw the development of Code Pre-trained Models (CodePTMs). This section provides a comprehensive review, discussing representative models, their incorporation of code structural insights, and examinations of their inner mechanisms and robustness.

The Era of LLMs for Code

The adoption of LLMs has heralded a new frontier in code intelligence. This section explores the development of CodeLLMs, detailing prominent models, highlighted by their architectures, training data, learning objectives, and application paradigms.

Synergies in Machine Intelligence

This part explores emerging research demonstrating the synergies between code intelligence and broader domains of machine intelligence, including new reasoning paradigms enabled by code generation, enhancement of mathematical abilities through code training, and the application of code as an intermediary format in solving typical NLP tasks.

Practical Applications

Shifting focus to real-world applications, we explore how neural code intelligence is being actively applied in software engineering workflows, data-driven decision-making, building intelligent agents, and advancing AI4Science research.

Opportunities and Future Directions

Despite the significant advances, there remain numerous promising research directions and challenges in neural code intelligence. This section outlines potential future developments, including exploring beyond the transformer architecture, revitalizing the use of code features, and addressing the efficiency of CodeLLMs.

Conclusion

This survey encapsulates the dynamic evolution of neural code intelligence, illustrating its progression from foundational models to the current era of LLMs for code and its practical applications. By highlighting future research directions, this paper aims to inspire continued development and research in this rapidly evolving field.