Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MapCoder: Multi-Agent Code Generation for Competitive Problem Solving (2405.11403v1)

Published 18 May 2024 in cs.CL and cs.AI

Abstract: Code synthesis, which requires a deep understanding of complex natural language problem descriptions, generation of code instructions for complex algorithms and data structures, and the successful execution of comprehensive unit tests, presents a significant challenge. While LLMs demonstrate impressive proficiency in natural language processing, their performance in code generation tasks remains limited. In this paper, we introduce a new approach to code generation tasks leveraging multi-agent prompting that uniquely replicates the full cycle of program synthesis as observed in human developers. Our framework, MapCoder, consists of four LLM agents specifically designed to emulate the stages of this cycle: recalling relevant examples, planning, code generation, and debugging. After conducting thorough experiments, with multiple LLM ablations and analyses across eight challenging competitive problem-solving and program synthesis benchmarks, MapCoder showcases remarkable code generation capabilities, achieving new state-of-the-art results (pass@1) on HumanEval (93.9%), MBPP (83.1%), APPS (22.0%), CodeContests (28.5%), and xCodeEval (45.3%). Moreover, our method consistently delivers superior performance across various programming languages and varying problem difficulties. We open-source our framework at https://github.com/Md-Ashraful-Pramanik/MapCoder.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Unified pre-training for program understanding and generation. arXiv preprint arXiv:2103.06333.
  2. Santacoder: don’t reach for the stars! arXiv preprint arXiv:2301.03988.
  3. Task-oriented dialogue as dataflow synthesis. Transactions of the Association for Computational Linguistics, 8:556–571.
  4. Program synthesis with large language models. arXiv preprint arXiv:2108.07732.
  5. Codet: Code generation with generated tests. arXiv preprint arXiv:2207.10397.
  6. Evaluating large language models trained on code.
  7. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  8. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128.
  9. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  10. Codescore: Evaluating code generation by learning code execution. arXiv preprint arXiv:2301.09043.
  11. Self-collaboration code generation via chatgpt.
  12. Codebert: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1536–1547.
  13. Incoder: A generative model for code infilling and synthesis. arXiv preprint arXiv:2204.05999.
  14. Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. ACM Sigplan Notices, 46(1):317–330.
  15. Deepseek-coder: When the large language model meets programming–the rise of code intelligence. arXiv preprint arXiv:2401.14196.
  16. Vincent J. Hellendoorn and Premkumar Devanbu. 2017. Are deep neural networks the best choice for modeling source code? In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, pages 763–773, New York, NY, USA. ACM.
  17. On the naturalness of software. Commun. ACM, 59(5):122–131.
  18. Agentcoder: Multi-agent-based code generation with iterative testing and optimisation. arXiv preprint arXiv:2312.13010.
  19. Mistral 7b.
  20. Self-planning code generation with large language model. arXiv preprint arXiv:2303.06689.
  21. xcodeeval: A large scale multilingual multitask benchmark for code understanding, generation, translation and retrieval. arXiv preprint arXiv:2303.03004.
  22. Donald E Knuth. 1992. Literate programming. CSLI Lecture Notes, Stanford, CA: Center for the Study of Language and Information (CSLI), 1992.
  23. Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Advances in Neural Information Processing Systems, 35:21314–21328.
  24. Motcoder: Elevating large language models with modular of thought for challenging programming tasks. arXiv preprint arXiv:2312.15960.
  25. Competition-level code generation with alphacode. Science, 378(6624):1092–1097.
  26. Competition-level code generation with alphacode.
  27. Is your code generated by chatGPT really correct? rigorous evaluation of large language models for code generation. In Thirty-seventh Conference on Neural Information Processing Systems.
  28. Zohar Manna and Richard J. Waldinger. 1971. Toward automatic program synthesis. Commun. ACM, 14(3):151–165.
  29. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474.
  30. Feedback-directed random test generation. In 29th International Conference on Software Engineering (ICSE’07), pages 75–84. IEEE.
  31. Emilio Parisotto and Ruslan Salakhutdinov. 2017. Neural map: Structured memory for deep reinforcement learning. arXiv preprint arXiv:1702.08360.
  32. Retrieval augmented code generation and summarization. arXiv preprint arXiv:2108.11601.
  33. Building language models for text with named entities. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2373–2383, Melbourne, Australia. Association for Computational Linguistics.
  34. Retrieval enhanced data augmentation for question answering on privacy policies. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 201–210, Dubrovnik, Croatia. Association for Computational Linguistics.
  35. Oleksandr Polozov and Sumit Gulwani. 2015. Flashmeta: A framework for inductive program synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 107–126.
  36. Abstract syntax networks for code generation and semantic parsing. CoRR, abs/1704.07535.
  37. Code generation with alphacodium: From prompt engineering to flow engineering. arXiv preprint arXiv:2401.08500.
  38. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
  39. Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems.
  40. Automatic prompt augmentation and selection with chain-of-thought from labeled data. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12113–12139, Singapore. Association for Computational Linguistics.
  41. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  42. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In EMNLP, pages 8696–8708.
  43. Learning to filter context for retrieval-augmented generation. arXiv preprint arXiv:2311.08377.
  44. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  45. Re-reading improves reasoning in language models.
  46. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
  47. Large language models as analogical reasoners. arXiv preprint arXiv:2310.01714.
  48. Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. CoRR, abs/1704.01696.
  49. CoSQL: A conversational text-to-SQL challenge towards cross-domain natural language interfaces to databases. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1962–1979, Hong Kong, China. Association for Computational Linguistics.
  50. Cumulative reasoning with large language models. arXiv preprint arXiv:2308.04371.
  51. Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493.
  52. Language agent tree search unifies reasoning acting and planning in language models. arXiv preprint arXiv:2310.04406.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Md. Ashraful Islam (11 papers)
  2. Mohammed Eunus Ali (37 papers)
  3. Md Rizwan Parvez (24 papers)
Citations (20)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets