Chain-of-Agents (CoA) Framework

Updated 18 October 2025

Chain-of-Agents (CoA) is an architectural paradigm for distributed multi-agent reasoning that segments tasks among specialized agents.
It employs protocols, shared memory, and sequential message passing to efficiently process long-context inputs and multi-hop reasoning challenges.
CoA frameworks enhance performance in domains like supply chain management, reinforcement learning, and retrieval-augmented tasks while addressing orchestration complexities.

Chain-of-Agents (CoA) is an architectural and algorithmic paradigm for orchestrating distributed decision-making, reasoning, and tool use across sequences or networks of specialized agents. These agents collaborate through explicit protocols, shared memory, or implicit role activation, enabling systems to efficiently process long contexts, solve multi-stage reasoning tasks, and coordinate complex workflows. CoA frameworks appear across supply chain management, multi-agent reinforcement learning, LLM orchestration, retrieval-augmented generation, and next-generation web agent infrastructures, with implementations ranging from modular protocol stacks to end-to-end agentic models.

1. Paradigm and Formalization

The Chain-of-Agents approach generalizes multi-agent collaboration by structuring problem-solving into chains or directed networks of autonomous actors—each tasked with a well-defined role (e.g., information extraction, decision-making, negotiation, tool invocation). A canonical CoA system segments input (text, knowledge base, environment state) so that each agent processes a manageable subset or aspect, passing context or results downstream. Mathematical formalization in coagent networks (Zini et al., 2020) extends the notion to arbitrary stochastic agent networks producing ordered “execution paths”:

$P = ((x_{o_1}, u_{o_1}), \ldots, (x_{o_k}, u_{o_k}))$

where $(x_{o}, u_{o})$ denotes the state-action pair for agent $o$ . This abstraction admits synchronous, asynchronous, hierarchical, and cyclic agent topologies, facilitating robust composition for reinforcement learning and cooperative planning.

In long-context LLM systems (Zhang et al., 4 Jun 2024), CoA divides large inputs into $l$ segments $c_i$ and routes each chunk through a worker agent—each with its own local context—aggregating intermediate summaries via communication units (CU):

$CU_{i} = LLM_{W_i}(I_W, CU_{i-1}, c_{i}, q)$

A manager agent then synthesizes the global response from the final aggregated CU, ensuring full input coverage and focused reasoning.

2. Agent Roles, Workflow Structures, and Protocols

Agent roles in CoA frameworks are domain-specific but typically include:

Planner/Administrator: Initializes the workflow, decomposes tasks or queries into actionable subcomponents (Ezzeddine et al., 2012, Gupta et al., 6 Oct 2025).
Worker: Processes local context, extracts facts, performs inference or tool calls, and refines questions or memory (Zhang et al., 4 Jun 2024, Gupta et al., 6 Oct 2025).
Manager/Selection/Broker: Aggregates results, selects optimal responses or contracts, reconciles agent outputs, synthesizes final answers (Ezzeddine et al., 2012, Zhang et al., 4 Jun 2024).

Multi-agent protocols may rely on:

Free-form summaries or messages: Sequential contextual abstraction (Zhang et al., 4 Jun 2024).
Centralized structured memory: Persistent, auditable collective state updated in fixed micro-cycles (Extract, Infer, Refine) (Gupta et al., 6 Oct 2025).
Semantic Web Services and ontology-based matchmaking: Ensures interoperability and semantic agreement (Ezzeddine et al., 2012).

Protocol stacks such as Coral Protocol (Georgio et al., 30 Apr 2025) and Co-TAP (An et al., 9 Oct 2025) extend CoA to Internet-scale ecosystems by enforcing standardized JSON messaging, secure team formation via cryptographic contracts, modular coordination, and unified service registration. These infrastructures are engineered for vendor neutrality, automatic agent discovery, and robust cross-domain collaboration.

3. Mathematical Principles and Learning Algorithms

CoA implementations encode agent reasoning and collaboration via explicit optimization and learning objectives. In RL-based coagent networks (Zini et al., 2020):

The policy of the overall agent is a product over per-agent policies along an execution path,

$\Pi(P | x) = \prod_{o \in P} \pi_o(u_o | x_o)$

with global policy gradients distributed across agents as,

$\nabla_\theta J_\Pi = \sum_{x_0} d(x_0) \sum_o \sum_{x : x_o \in S_o} d(x_o, x | x_0) \sum_{u_o} \frac{d\pi_o}{d\theta}(u_o | x_o) Q_{\pi_o}(x_o, u_o)$

Agentic RL and multi-agent distillation frameworks (Li et al., 6 Aug 2025, Zhang et al., 9 Mar 2025) leverage supervised fine-tuning on multi-agent trajectories and reinforcement learning with outcome-driven reward functions:

Contrastive and auxiliary losses are employed for action triggering:

$L_{contra} = -\log \sigma[P(x_{chosen}|c) - P(x_{rejected}|c)]$

RL objectives optimize relative advantage across action groups:

$J(\theta) = E_{\tau \sim \pi_{\theta}}[\sum_t \log \pi_\theta(a_t | s_t) A_t], \quad A_t = r_t - \frac{1}{K} \sum_{k=1}^K r_{t, k}$

Collaborative knowledge fusion is formalized via explicit and tacit knowledge graphs (Zhao et al., 16 May 2025), with counterfactual variables $\theta_{ij}$ , semantic similarity via

$\text{Sim}(v_k, v_n) = \frac{v_k \cdot v_n}{\|v_k\| \|v_n\|}$

and aggregation of multi-stage prompts from the tree structure.

4. Efficient Task Decomposition and Reasoning

CoA architectures excel in decomposing complex, long-horizon, and multi-hop reasoning tasks. In supply chain management (Ezzeddine et al., 2012), specialized agents coordinate ontology-driven negotiation, service discovery, and proposal selection via utility functions:

$U(s) = \sum_{i=1}^n w_i \cdot f_i(s) \qquad \mathrm{maximize}\ U(s)\ \mathrm{subject\ to\ constraints}$

In chain-of-abstraction tool use (Gao et al., 30 Jan 2024), abstract chains are first constructed with symbolic placeholders (e.g., $[20 + 35 = y_1]$ ), decoupling general reasoning strategy from data-specific computation. Domain tools then reify each placeholder in parallel, reducing inference latency and ensuring robustness to domain shifts.

Conversational agent frameworks (Pan et al., 28 May 2024) implement dynamic reasoning-retrieval chains, decomposing queries into multi-step sub-questions, systematizing verification via contextual knowledge sets, and quantifying faithfulness with scores:

$S = \alpha P + \beta Rcl + \gamma AWL$

5. Memory and Communication Mechanisms

Communication in CoA can occur via:

Free-form sequential message passing: Each agent passes summaries that aggregate local inferences, though such designs risk information loss (Zhang et al., 4 Jun 2024).
Structured memory-centric protocols: As in COSMIR (Gupta et al., 6 Oct 2025), agents update a centralized memory $M = \langle \mathcal{Q}, \mathcal{A}_g, \mathcal{A}_i, a \rangle$ in fixed cycles, enabling full auditability and improved evidence aggregation.
Standardized messaging formats: JSON schema-based templates (e.g., $M = \{\text{header}, \text{payload}\}$ ), facilitating cross-vendor interoperability (Georgio et al., 30 Apr 2025).
Layered cognitive protocols: Co-TAP’s MEK protocol implements a memory $\rightarrow$ extraction $\rightarrow$ knowledge pipeline, forming shareable collective intelligence (An et al., 9 Oct 2025).

6. Empirical Performance, Applications, and Limitations

Empirical evaluations demonstrate CoA’s superiority across various domains:

LLM long-context tasks: CoA improves QA, summarization, and code completion performance by up to 10% over RAG and full-context methods (Zhang et al., 4 Jun 2024).
GUI automation: Chain-of-action agents outperform parsing-based baselines in multitask settings, achieving 90% action type accuracy and 74% overall success (Zhang et al., 2023).
Multi-hop QA with knowledge synergy: CoCoA-zero and CoCoA significantly improve EM/F1 scores on open-domain and multi-hop datasets; long-chain training yields 15% average gains (Jiang et al., 3 Aug 2025).
End-to-end agentic models: AFMs set new state-of-the-art on web/coding agent benchmarks via chain-of-agents problem-solving (Li et al., 6 Aug 2025).
Supply chain: Agent-based frameworks with semantic agreements and negotiation ontologies resolve coordination and information flow challenges (Ezzeddine et al., 2012).

Limitations noted include potential system complexity, coordination difficulties, and the need for robust error handling across agent interfaces, especially when agent interoperability or external tool calls are required.

7. Infrastructural and Engineering Foundations

Recent infrastructures such as Coral Protocol (Georgio et al., 30 Apr 2025) and Co-TAP (An et al., 9 Oct 2025) provide standardization for large-scale CoA ecosystems:

Coral standardizes agent messaging, modular coordination, and secure team formation via cryptographic identifiers and blockchain-backed contracts, fostering interoperability and collective intelligence in the “Internet of Agents.”
Co-TAP layers human-agent interaction, unified protocol discovery, and cognitive memory-extraction-knowledge sharing to realize seamless inter-agent collaboration and continual learning.

Cochain (Zhao et al., 16 May 2025) further refines agent collaboration in business workflows by integrating explicit/tacit knowledge graphs and prompt trees, achieving near-optimal expert evaluation even with small model backbones.

Chain-of-Agents defines a broad family of frameworks for compositional, multi-agent reasoning, offering scalable, robust approaches for distributed cognition in both domain-specific and general intelligent systems. Its evolution encompasses mathematical formalism, protocol engineering, memory design, and open-source agentic RL, with tangible empirical benefits and expanding cross-domain applicability.