Agents in Software Engineering: Survey, Landscape, and Vision (2409.09030v2)

Published 13 Sep 2024 in cs.SE, cs.AI, and cs.CL

Abstract: In recent years, LLMs have achieved remarkable success and have been widely used in various downstream tasks, especially in the tasks of the software engineering (SE) field. We find that many studies combining LLMs with SE have employed the concept of agents either explicitly or implicitly. However, there is a lack of an in-depth survey to sort out the development context of existing works, analyze how existing works combine the LLM-based agent technologies to optimize various tasks, and clarify the framework of LLM-based agents in SE. In this paper, we conduct the first survey of the studies on combining LLM-based agents with SE and present a framework of LLM-based agents in SE which includes three key modules: perception, memory, and action. We also summarize the current challenges in combining the two fields and propose future opportunities in response to existing challenges. We maintain a GitHub repository of the related papers at: https://github.com/DeepSoftwareAnalytics/Awesome-Agent4SE.

PDF HTML Abstract

Agents in Software Engineering: Survey, Landscape, and Vision

This paper, "Agents in Software Engineering: Survey, Landscape, and Vision," authored by Yanxian Huang et al., provides a comprehensive survey and analysis of the integration of LLMs with agent technologies to optimize a variety of tasks in the field of software engineering (SE). The paper presents a nuanced conceptual framework for LLM-based agents in SE, identifies existing challenges, and proposes future research opportunities.

Core Framework and Components

The framework proposed for LLM-based agents in SE is principally organized into three interconnected modules: perception, memory, and action.

Perception Module: This module connects the LLM-based agent to the external environment. It processes inputs of different modalities such as textual, visual, and auditory input, and transforms these into formats understandable and processable by the LLM. The paper points out the current inclination towards token-based textual inputs, overlooking the utilization of tree/graph-based inputs which could better capture the structural characteristics of code.
Memory Module: This component consists of semantic, episodic, and procedural memory. Semantic memory is maintained using external knowledge retrieval bases containing documents, APIs, and other code-related knowledge. Episodic memory involves data from previous interactions and decision-making processes, utilized for in-context learning. Procedural memory encompasses long-term knowledge stored both implicitly in the LLM's weights and explicitly in agent code.
Action Module: Actions of the LLM-based agent are classified into internal and external actions. Internal actions involve reasoning, retrieval, and learning. Reasoning relies on methods like Chain-of-Thought and structured CoT for detailed cognitive processing. Retrieval actions aid in fetching relevant information from knowledge bases to assist reasoning. Learning actions enhance the agent’s implicit and explicit knowledge through continual updates. External actions involve interactions with humans, other agents, and digital environments like compilers and search engines, providing iterative feedback and additional knowledge.

Analysis of Challenges and Opportunities

The paper provides a detailed analysis of the challenges faced when integrating LLM-based agents in software engineering, along with identifying promising directions for future research:

Perception Module Exploration: Existing efforts predominantly focus on token-based textual inputs and fall short in addressing other modalities like tree/graph-based, visual, and auditory inputs. Exploring these alternative input modalities could enhance the agents' comprehensiveness and effectiveness.
Role-playing Abilities: Many SE tasks require agents to perform multiple roles simultaneously. The current models lack flexibility in assuming diverse roles and balancing multiple roles effectively. Developing mechanisms to extend the role-playing capabilities of these agents is crucial.
Knowledge Retrieval Base: There is an absence of an established, rich, and reliable code-specific knowledge base in SE, analogous to repositories like Wikipedia in NLP. Constructing such an exhaustive and authoritative knowledge base could immensely contribute to the efficiency and accuracy of LLM-based agents.
Hallucinations in LLM-based Agents: LLM-based agents often produce hallucinations, especially around generating synthetic APIs. Identifying the root causes and mitigating these hallucinations is essential for improving the reliability of agents.
Efficiency of Multi-agent Collaboration: The collaboration between multiple agents can be computationally demanding and incurs communication overhead. There is a need for techniques that optimize resource allocation, reduce communication costs, and enhance overall efficiency in multi-agent systems.
Integration of SE Technologies: Advanced SE techniques can significantly boost the functionality and performance of LLM-based agents. The integration of such technologies is underexplored and offers a promising avenue for future investigation.

Conclusion

The survey conducted by Huang et al. contributes significantly to the field by dissecting the integration of LLM-based agent technologies in software engineering. By categorizing the related works into a framework comprising perception, memory, and action modules, the authors have provided a structured understanding of the current landscape. The identified challenges and future opportunities pave the way for subsequent research efforts aimed at improving and expanding the capabilities of LLM-based agents in SE. The cross-pollination between the fields of SE and LLM-based agents underscores a symbiotic relationship that could drive future technological advancements.