Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation (2312.13010v3)

Published 20 Dec 2023 in cs.CL
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

Abstract: The advancement of NLP has been significantly boosted by the development of transformer-based LLMs. These models have revolutionized NLP tasks, particularly in code generation, aiding developers in creating software with enhanced efficiency. Despite their advancements, challenges in balancing code snippet generation with effective test case generation and execution persist. To address these issues, this paper introduces Multi-Agent Assistant Code Generation (AgentCoder), a novel solution comprising a multi-agent framework with specialized agents: the programmer agent, the test designer agent, and the test executor agent. During the coding procedure, the programmer agent will focus on the code generation and refinement based on the test executor agent's feedback. The test designer agent will generate test cases for the generated code, and the test executor agent will run the code with the test cases and write the feedback to the programmer. This collaborative system ensures robust code generation, surpassing the limitations of single-agent models and traditional methodologies. Our extensive experiments on 9 code generation models and 12 enhancement approaches showcase AgentCoder's superior performance over existing code generation models and prompt engineering techniques across various benchmarks. For example, AgentCoder (GPT-4) achieves 96.3\% and 91.8\% pass@1 in HumanEval and MBPP datasets with an overall token overhead of 56.9K and 66.3K, while state-of-the-art obtains only 90.2\% and 78.9\% pass@1 with an overall token overhead of 138.2K and 206.5K.

AgentCoder: Multiagent-Code Generation with Iterative Testing and Optimization

Introduction

The paper introduces a novel framework, AgentCoder, aimed at enhancing automated code generation through a multi-agent system. The field of code generation has seen significant advancements with the rise of transformer-based LLMs, although existing approaches often face limitations in balancing code generation quality with effective test case generation. AgentCoder seeks to address these challenges by employing a multi-agent framework comprising the programmer agent, the test designer agent, and the test executor agent. Each agent is tasked with specialized roles that collaboratively enhance the overall efficacy of the code generation process.

Framework Overview

AgentCoder divides the code generation process into distinct phases handled by specialized agents:

  1. Programmer Agent: This agent focuses on generating and refining code snippets based on coding requirements and feedback from the test executor agent. It employs a structured Chain-of-Thought approach, breaking down the task into manageable steps, ranging from problem understanding to actual code implementation.
  2. Test Designer Agent: This agent generates comprehensive test cases aimed at evaluating the correctness of the produced code snippets. It focuses on creating a variety of tests, including basic, edge, and large-scale cases, ensuring thorough code validation.
  3. Test Executor Agent: Operating through Python scripts that interact with a local environment, this agent executes the generated code using the test cases and provides detailed feedback. It iterates this process, prompting the programmer agent to refine the code until all tests are passed.

Evaluation and Results

The paper evaluates AgentCoder across four datasets: HumanEval, HumanEval-ET, MBPP, and MBPP-ET, using pass@1 as the primary metric. The experiments involve 12 LLMs and 13 LLM-based optimization approaches.

Key Findings:

  • Outperformance Across Benchmarks: AgentCoder exhibits superior performance across all datasets, consistently surpassing traditional LLMs and state-of-the-art optimization techniques. For example, using GPT-3.5-turbo, AgentCoder achieves a pass@1 of 77.4% and 89.1% on HumanEval-ET and MBPP-ET, respectively, whereas the closest state-of-the-art approach, CodeCoT, attains only 69.5% and 63.0%.
  • Iterative Enhancement: The framework benefits from iterative refinement. Increasing the number of iterations from 1 to 5 raises pass@1 significantly, showcasing the effectiveness of the feedback loop within AgentCoder.
  • Agent Contribution: The individual contributions of the programmer, test designer, and test executor agents are crucial. The paper demonstrates that combining these roles into a multi-agent system yields higher accuracy and coverage in test generation compared to a single-agent setup.
  • Test Case Accuracy and Coverage: The test designer agent in AgentCoder generates highly accurate and comprehensive test cases. Evaluations show that these test cases achieve a much higher accuracy and code coverage than those generated by other methods such as CodeCoT.

Implications and Future Directions

Practical Implications:

The modularity and scalability of AgentCoder mean it can adapt to technological advancements by easily integrating more sophisticated models. Its focus on iterative testing aligns with practical software development practices, offering developers a reliable tool for both generating and validating code.

Theoretical Implications:

The multi-agent framework can be a promising direction for other AI-driven tasks, suggesting potential benefits in areas requiring complex problem-solving and decision-making processes. It opens avenues for further exploration into the collaborative synergy of specialized agents.

Future Developments:

Given its current achievements, possible future enhancements could involve:

  • Integration with more advanced LLMs: As new models are developed, they can be incorporated into AgentCoder to further refine its capabilities.
  • Exploration of additional agent roles: Expanding the framework to include agents focused on other aspects of software development, such as documentation generation or performance optimization.
  • Application to other domains: Adapting the multi-agent framework to tasks beyond code generation, leveraging its structured, iterative testing and refinement approach.

Conclusion

AgentCoder represents a significant advancement in automated code generation, leveraging a multi-agent collaboration strategy to address the inherent challenges of balancing code quality with effective test generation. Its extensive evaluation illustrates substantial improvements over existing methods, positioning it as a versatile and powerful tool in the evolving landscape of software development and AI. The modular design ensures that AgentCoder remains adaptable and scalable, paving the way for future enhancements and broader applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Dong Huang (102 papers)
  2. Qingwen Bu (15 papers)
  3. Jie M. Zhang (39 papers)
  4. Michael Luck (13 papers)
  5. Heming Cui (29 papers)
  6. Yuhao Qing (11 papers)
Youtube Logo Streamline Icon: https://streamlinehq.com