A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond (2403.14734v4)

Published 21 Mar 2024 in cs.SE, cs.AI, cs.CL, and cs.PL

Abstract: Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronological review of the advancements in code intelligence, encompassing over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works. We follow the historical progression to trace the paradigm shifts across different research phases (e.g., from modeling code with recurrent neural networks to the era of LLMs). Concurrently, we highlight the major technical transitions in models, tasks, and evaluations spanning through different stages. For applications, we also observe a co-evolving shift. It spans from initial endeavors to tackling specific scenarios, through exploring a diverse array of tasks during its rapid expansion, to currently focusing on tackling increasingly complex and varied real-world challenges. Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence, uncovering new cross-domain opportunities and illustrating the substantial influence of code intelligence across various domains. Finally, we delve into both the opportunities and challenges associated with this field, alongside elucidating our insights on the most promising research directions. An ongoing, dynamically updated project and resources associated with this survey have been released at https://github.com/QiushiSun/Awesome-Code-Intelligence.

Authors (18)

Qiushi Sun (26 papers)
Zhirui Chen (11 papers)
Fangzhi Xu (22 papers)
Kanzhi Cheng (14 papers)
Chang Ma (20 papers)
Zhangyue Yin (27 papers)
Jianing Wang (50 papers)
Chengcheng Han (83 papers)
Renyu Zhu (17 papers)
Shuai Yuan (68 papers)
Qipeng Guo (72 papers)
Xipeng Qiu (257 papers)
Pengcheng Yin (42 papers)
Xiaoli Li (120 papers)
Fei Yuan (28 papers)
Lingpeng Kong (134 papers)
Xiang Li (1003 papers)
Zhiyong Wu (171 papers)

Summary

A Survey of Neural Code Intelligence: Evolution, Current Paradigms, and Future Directions

Introduction

Neural Code Intelligence (NCI) encompasses methodologies leveraging deep learning techniques to comprehend, generate, and optimize source code. Having emerged from the confluence of advancements in deep learning and access to extensive code data, NCI has quickly risen as an area of significant research interest and practical relevance. This survey delivers a meticulous review of the evolution of NCI, underscoring the progression from initial models primarily relying on recurrent neural networks to the present-day dominance of LLMs for code.

Preliminaries on Code Embeddings and Tasks

Early endeavors in NCI focused on embedding code snippets as vectors, paving the way for neural networks’ application in code-related tasks. This section revisits classical methods for processing code through neural LLMs, elaborating on strategies for integrating structural information and providing an overview of quintessential code-related tasks.

Evolution of Code Pre-trained Models

Following the success of transformer-based pre-trained LLMs in NLP, a similar paradigm shift in code intelligence saw the development of Code Pre-trained Models (CodePTMs). This section provides a comprehensive review, discussing representative models, their incorporation of code structural insights, and examinations of their inner mechanisms and robustness.

The Era of LLMs for Code

The adoption of LLMs has heralded a new frontier in code intelligence. This section explores the development of CodeLLMs, detailing prominent models, highlighted by their architectures, training data, learning objectives, and application paradigms.

Synergies in Machine Intelligence

This part explores emerging research demonstrating the synergies between code intelligence and broader domains of machine intelligence, including new reasoning paradigms enabled by code generation, enhancement of mathematical abilities through code training, and the application of code as an intermediary format in solving typical NLP tasks.

Practical Applications

Shifting focus to real-world applications, we explore how neural code intelligence is being actively applied in software engineering workflows, data-driven decision-making, building intelligent agents, and advancing AI4Science research.

Opportunities and Future Directions

Despite the significant advances, there remain numerous promising research directions and challenges in neural code intelligence. This section outlines potential future developments, including exploring beyond the transformer architecture, revitalizing the use of code features, and addressing the efficiency of CodeLLMs.

Conclusion

This survey encapsulates the dynamic evolution of neural code intelligence, illustrating its progression from foundational models to the current era of LLMs for code and its practical applications. By highlighting future research directions, this paper aims to inspire continued development and research in this rapidly evolving field.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/qiushi_sun/status/1773252567637639185

https://twitter.com/ComputerPapers/status/1772123376837898624

https://twitter.com/ComputerPapers/status/1805661609143984239

https://twitter.com/knishimae0531/status/1775879952485032038

https://twitter.com/morris_phd/status/1773016305097900360

https://twitter.com/jreuben1/status/1773342578022158555