MaCTG: Multi-Agent Collaborative Thought Graph for Automatic Programming (2410.19245v2)
Abstract: With the rapid advancement of LLMs, LLM-based approaches have demonstrated strong problem-solving capabilities across various domains. However, in automatic programming, a single LLM is typically limited to function-level code generation, while multi-agent systems composed of multiple LLMs often suffer from inefficient task planning. This lack of structured coordination can lead to cascading hallucinations, where accumulated errors across agents result in suboptimal workflows and excessive computational costs. To overcome these challenges, we introduce MaCTG (Multi-Agent Collaborative Thought Graph), a novel multi-agent framework that employs a dynamic graph structure to facilitate precise task allocation and controlled collaboration among LLM agents. MaCTG autonomously assigns agent roles based on programming requirements, dynamically refines task distribution through context-aware adjustments, and systematically verifies and integrates project-level code, effectively reducing hallucination errors and improving overall accuracy. MaCTG enhances cost-effectiveness by implementing a hybrid LLM deployment, where proprietary models handle complex reasoning, while open-source models are used for routine coding and validation tasks. To evaluate MaCTG's effectiveness, we applied it to traditional image processing auto-programming tasks, achieving a state-of-the-art accuracy of 83.33%. Additionally, by leveraging its hybrid LLM configuration, MaCTG significantly reduced operational costs by 89.09% compared to existing multi-agent frameworks, demonstrating its efficiency, scalability, and real-world applicability.
- GPT-4 Technical Report. arXiv preprint arXiv:2303.08774 (2023).
- Anthropic. 2024. The Claude 3 Model Family: Opus, Sonnet, Haiku. https://www-cdn.anthropic.com/f2986af8d052f26236f6251da62d16172cfabd6e/claude-3-model-card.pdf
- Robert Balzer. 1985. A 15 year perspective on automatic programming. IEEE Transactions on Software Engineering 11 (1985), 1257–1268.
- Towards Large Language Model Aided Program Refinement. arXiv preprint arXiv:2406.18616 (2024).
- Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128 (2023).
- Recent advances and applications of deep learning methods in materials science. npj Computational Materials 8, 1 (2022), 59.
- James O Coplien. 1998. A Generative Development—. The patterns handbook: Techniques, strategies, and applications 13 (1998), 243.
- DeepSeek. 2023. DeepSeek Coder: Let the Code Write Itself. https://github.com/deepseek-ai/DeepSeek-Coder.
- Code generation using machine learning: A systematic review. Ieee Access 10 (2022), 82434–82455.
- Self-collaboration code generation via chatgpt. arXiv preprint arXiv:2304.07590 (2023).
- The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024).
- Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).
- Luciano Floridi and Massimo Chiriatti. 2020. GPT-3: Its nature, scope, limits, and consequences. Minds and Machines 30 (2020), 681–694.
- DeepImageJ: A user-friendly environment to run deep learning models in ImageJ. Nature methods 18, 10 (2021), 1192–1195.
- Exploring the responses of large language models to beginner programmers’ help requests. In Proceedings of the 2023 ACM Conference on International Computing Education Research-Volume 1. 93–105.
- Agentcoder: Multi-agent-based code generation with iterative testing and optimisation. arXiv preprint arXiv:2312.13010 (2023).
- Saki Imai. 2022. Is github copilot a substitute for human pair-programming? an empirical study. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings. 319–321.
- A Survey on Large Language Models for Code Generation. arXiv preprint arXiv:2406.00515 (2024).
- Inferfix: End-to-end program repair with llms. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1646–1656.
- Nate Kushman and Regina Barzilay. 2013. Using semantic unification to generate regular expressions from natural language. North American Chapter of the Association for Computational Linguistics (NAACL).
- Image processing in agriculture. International journal of innovative research in electrical, electronics, instrumentation and control engineering 2, 6 (2014).
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
- Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach. Proceedings of the ACM on Programming Languages 8, OOPSLA1 (2024), 474–499.
- Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161 (2023).
- Competition-level code generation with alphacode. Science 378, 6624 (2022), 1092–1097.
- Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. In Thirty-seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=1qvx610Cu7
- Integrating programming by example and natural language programming. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 27. 661–667.
- Dirk Merkel et al. 2014. Docker: lightweight linux containers for consistent development and deployment. Linux j 239, 2 (2014), 2.
- OpenAI. 2024. Learning to Reason with LLMs. https://openai.com/index/learning-to-reason-with-llms/
- Studies on application of image processing in various fields: An overview. In IOP Conference Series: Materials Science and Engineering, Vol. 961. IOP Publishing, 012006.
- Chatdev: Communicative agents for software development. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 15174–15186.
- Compositional program synthesis from natural language and examples. In IJCAI 2015.
- rchavezj. 2019. OpenCV_Projects. https://github.com/rchavezj/OpenCV_Projects.
- Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 (2023).
- The ImageJ ecosystem: Open-source software for image visualization, processing, and analysis. Protein Science 30, 1 (2021), 234–249.
- A review on deep learning in medical image analysis. International Journal of Multimedia Information Retrieval 11, 1 (2022), 19–38.
- Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1433–1443.
- CodeGemma Team. 2024a. Codegemma: Open code models based on gemma. arXiv preprint arXiv:2406.11409 (2024).
- Qwen Team. 2024b. Qwen2.5: A Party of Foundation Models. https://qwenlm.github.io/blog/qwen2.5/
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- A survey on large language model based autonomous agents. Frontiers of Computer Science 18, 6 (2024), 186345.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171 (2022).
- Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022), 24824–24837.
- Magicoder: Source Code Is All You Need. arXiv preprint arXiv:2312.02120 (2023).
- The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864 (2023).
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv:2305.10601 [cs.CL] https://arxiv.org/abs/2305.10601
- Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. arXiv preprint arXiv:1704.01696 (2017).
- yoyoyo yo. 2019. Gasyori100knock. https://github.com/yoyoyo-yo/Gasyori100knock.
- ITK-SNAP: An interactive tool for semi-automatic segmentation of multi-modality biomedical images. In 2016 38th annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, 3342–3345.
- Pydex: Repairing bugs in introductory python assignments using llms. Proceedings of the ACM on Programming Languages 8, OOPSLA1 (2024), 1100–1124.
- Codeagent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges. arXiv preprint arXiv:2401.07339 (2024).
- Self-edit: Fault-aware code editor for code generation. arXiv preprint arXiv:2305.04087 (2023).
- Toolcoder: Teach code generation models to use api search tools. arXiv preprint arXiv:2305.04032 (2023).
- Siren’s song in the AI ocean: a survey on hallucination in large language models. arXiv preprint arXiv:2309.01219 (2023).