Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Published 6 Aug 2025 in cs.AI and cs.CL | (2508.13167v1)

Abstract: Recent advances in LLMs and multi-agent systems have demonstrated remarkable capabilities in complex problem-solving tasks such as deep research, vibe coding, and mathematical reasoning. However, most existing multi-agent systems are built upon manual prompt/workflow engineering with sophisticated agent frameworks, making them computationally inefficient, less capable, and can not benefit from data-centric learning. In this work, we introduce Chain-of-Agents (CoA), a novel paradigm of LLM reasoning that enables native end-to-end complex problem-solving in the same way as a multi-agent system (i.e., multi-turn problem solving with multiple tools and multiple agents) within one model. In chain-of-agents problem-solving, the model dynamically activates different tool agents and role-playing agents to simulate multi-agent collaboration in an end-to-end fashion. To elicit end-to-end chain-of-agents problem-solving abilities in LLMs, we introduce a multi-agent distillation framework to distill state-of-the-art multi-agent systems into chain-of-agents trajectories for agentic supervised fine-tuning. We then use agentic reinforcement learning on verifiable agentic tasks to further improve the models' capabilities on chain-of-agents problem solving. We call the resulting models Agent Foundation Models (AFMs). Our empirical studies demonstrate that AFM establishes new state-of-the-art performance across diverse benchmarks in both web agent and code agent settings. We make the entire research, including the model weights, code for training and evaluation, and the training data, fully open-sourced, which offers a solid starting point for future research on agent models and agentic RL.

Abstract PDF Upgrade to Chat

Authors (30)

First 10 authors:

Summary

The paper introduces the Chain-of-Agents paradigm that unifies agentic reinforcement learning with multi-agent distillation for streamlined multi-agent collaboration.
The paper employs a novel multi-agent distillation technique, converting successful MAS trajectories into training datasets to capture complex reasoning patterns.
The paper demonstrates state-of-the-art performance on benchmarks like GAIA and BrowseComp, showcasing improved adaptability with unseen tools.

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

The paper "Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL" introduces a novel approach called Chain-of-Agents (CoA), designed to enhance multi-agent systems' problem-solving capabilities by integrating agentic reinforcement learning and multi-agent distillation into a unified framework.

Introduction and Background

Multi-agent systems (MAS) have made strides in complex problem-solving by enabling collaboration among diverse agents equipped with specialized toolsets. Despite this progress, MAS's reliance on manual prompt engineering and complex workflow designs poses challenges in terms of efficiency, capability enhancement, and adaptability to new domains. Previous approaches like Tool-Integrated Reasoning (TIR) models have incorporated tool usage in the reasoning process, yet they fall short of supporting multi-agent systems' end-to-end execution.

The CoA paradigm addresses these limitations by simulating multi-agent collaboration within a single model framework. It activates various agents dynamically to mimic collaborative efforts, reducing computational overhead and supporting both supervised fine-tuning and reinforcement learning.

Method

Chain-of-Agents Paradigm

The CoA paradigm extends TIR by facilitating flexible agent activation, promoting diverse role-playing and tool agents. This intrinsically models multi-agent collaboration by maintaining persistent reasoning states explored through dynamic transitions.

Figure 1: Illustration of TIR and CoA paradigms. TIR uses a static ``Think-Action-Observation'' workflow whereas CoA supports any workflow that can be modeled by a multi-agent system, supporting more diverse role-playing agents and tool agents.

By implementing CoA within a unified model, CoA eliminates redundant inter-agent communication seen in conventional multi-agent frameworks, thus achieving computational efficiency and contextual continuity.

Agentic Supervised Fine-tuning

The paper introduces multi-agent distillation combining agent-level and sequence-level knowledge distillation. Successful multi-agent system trajectories are converted into CoA-compatible ones to enrich training datasets, enabling the distillation of complex reasoning patterns.

Figure 2: Illustration of the proposed multi-agent distillation framework, which synthesizes Chain-of-Agents trajectories with state-of-the-art multi-agent systems such as OAgents.

Reinforcement Learning and Reward Design

Agentic RL optimizes tool orchestration, balancing correctness and tool efficiency. Distinct reward functions cater to the web and code agent scenarios.

Experiments

Empirical results demonstrate AFM's state-of-the-art performance across multiple benchmarks, including GAIA and BrowseComp, highlighting its effectiveness in web agent tasks. The framework consistently surpasses traditional TIR methods in efficiency and reasoning capabilities.

Figure 3: Performance comparison of AFM with the proposed Chain-of-Action paradigm against state-of-the-art tool-integrated reasoning (TIR) methods on GAIA, BrowseComp, HLE, and AIME25 benchmarks. AFM demonstrates consistent effectiveness across web agent and code agent benchmarks.

The study reveals AFM's superior generalization ability, functioning robustly with unseen tools, exemplified in dynamic web environments.

Conclusion

CoA integrates dynamic collaboration into a singular model, leveraging multi-agent distillation to unify complex problem-solving frameworks. This approach paves the way for more efficient, scalable, and adaptive agent systems, poised to redefine agent-based AI models. The research offers a foundational blueprint for future studies in agentic RL and MAS advancements.

Markdown Report Issue