Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage
Abstract: Despite the impressive capabilities of LLMs, they currently exhibit two primary limitations, \textbf{\uppercase\expandafter{\romannumeral 1}}: They struggle to \textbf{autonomously solve the real world engineering problem}. \textbf{\uppercase\expandafter{\romannumeral 2}}: They remain \textbf{challenged in reasoning through complex logic problems}. To address these challenges, we developed the \textsc{Infant Agent}, integrating task-aware functions, operators, a hierarchical management system, and a memory retrieval mechanism. Together, these components enable LLMs to sustain extended reasoning processes and handle complex, multi-step tasks efficiently, all while significantly reducing API costs. Using the \textsc{Infant Agent}, GPT-4o's accuracy on the SWE-bench-lite dataset rises from $\mathbf{0.33\%}$ to $\mathbf{30\%}$, and in the AIME-2024 mathematics competition, it increases GPT-4o's accuracy from $\mathbf{13.3\%}$ to $\mathbf{37\%}$.
- Conversational health agents: A personalized llm-powered agent framework. arXiv preprint arXiv:2310.02374, 2023.
- AI, C. Devin - ai-powered collaborative teammate. https://devin.ai/, 2024. Accessed: 2024-10-28.
- AIME. 22nd international conference on artificial intelligence in medicine - aime 2024. https://artofproblemsolving.com/wiki/index.php/AIME_Problems_and_Solutions, 2024.
- Anthropic. Claude 3 model card october addendum. https://assets.anthropic.com/m/1cd9d098ac3e6467/original/Claude-3-Model-Card-October-Addendum.pdf, 2024a. Accessed: 2024-10-28.
- Anthropic, A. Claude 3.5 sonnet model card addendum. Claude-3.5 Model Card, 2024b.
- Towards adaptive workflow enactment using multiagent systems. Information technology and management, 6:61–87, 2005.
- Codeforces. Codeforces-contests. https://codeforces.com/contests, 2024.
- Gravitas, S. Autogpt. https://github.com/Significant-Gravitas/AutoGPT, 2024.
- Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects. Authorea Preprints, 2024.
- Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
- Qwen2. 5-coder technical report. arXiv preprint arXiv:2409.12186, 2024.
- Swe-bench: Can language models resolve real-world github issues? arXiv preprint arXiv:2310.06770, 2023.
- Robin3d: Improving 3d large language model via robust instruction tuning. arXiv preprint arXiv:2410.00255, 2024.
- Segvg: Transferring object bounding box to segmentation for visual grounding. In European Conference on Computer Vision, pp. 57–75. Springer, 2025.
- Lei, B. Macm: Utilizing a multi-agent system for condition mining in solving complex mathematical problems. arXiv preprint arXiv:2404.04735, 2024.
- Autocoder: Enhancing code large language model with\\\backslash\textsc {{\{{AIEV-Instruct}}\}}. arXiv preprint arXiv:2405.14906, 2024.
- Let’s verify step by step. arXiv preprint arXiv:2305.20050, 2023.
- Llm critics help catch llm bugs. arXiv preprint arXiv:2407.00215, 2024.
- Microsoft. Autogen: A programming framework for agentic ai. https://github.com/microsoft/autogen, 2024.
- Nakajima, Y. Babyagi. https://github.com/yoheinakajima/babyagi, 2024. Accessed: 2024-10-28.
- A comprehensive overview of large language models. arXiv preprint arXiv:2307.06435, 2023.
- OpenAI. Hello gpt-4o. https://openai.com/index/hello-gpt-4o/, 2024a.
- OpenAI. Learning to reason with llms. https://openai.com/index/learning-to-reason-with-llms/, 2024b.
- Chatdev: Communicative agents for software development. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 15174–15186, 2024.
- Gpqa: A graduate-level google-proof q&a benchmark. arXiv preprint arXiv:2311.12022, 2023.
- Research, S. Browsergym: A reinforcement learning environment for web browsing. https://github.com/ServiceNow/BrowserGym, 2023.
- Team, A. Agentgpt. https://agentgpt.reworkd.ai/, 2024a. Accessed: 2024-10-28.
- Team, A. Aider - ai pair programming. https://aider.chat/, 2024b. Accessed: 2024-10-28.
- Team, C. Cursor - ai code editor. https://www.cursor.com/, 2024c. Accessed: 2024-10-28.
- Team, Q. Qwen2.5-72b-instruct. https://huggingface.co/Qwen/Qwen2.5-72B-Instruct, 2024d. Accessed: 2024-10-28.
- Executable code actions elicit better llm agents. arXiv preprint arXiv:2402.01030, 2024a.
- Opendevin: An open platform for ai software developers as generalist agents. arXiv preprint arXiv:2407.16741, 2024b.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022.
- Integration of workflow and agent technology for business process management. In Proceedings of the Sixth International Conference on Computer Supported Cooperative Work in Design (IEEE Cat. No. 01EX472), pp. 420–426. IEEE, 2001.
- Swe-agent: Agent-computer interfaces enable automated software engineering. arXiv preprint arXiv:2405.15793, 2024a.
- Harnessing the power of llms in practice: A survey on chatgpt and beyond. ACM Transactions on Knowledge Discovery from Data, 18(6):1–32, 2024b.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.