Papers
Topics
Authors
Recent
Search
2000 character limit reached

Agent2Agent (A2A) Protocol

Updated 4 July 2026
  • Agent2Agent (A2A) is an inter-agent communication protocol that enables task delegation among autonomous agents using web-native interfaces and standardized JSON-RPC messaging.
  • It leverages Agent Cards, multimodal message parts, and explicit task lifecycles to advertise capabilities and coordinate tasks effectively.
  • A2A enhances security and interoperability in diverse settings—from enterprise orchestration to edge computing—by integrating with protocols like MCP.

Agent2Agent (A2A) is an inter-agent communication protocol for delegating tasks between autonomous software agents through web-native interfaces. Across the recent literature, it is described as a peer-to-peer or client/remote-agent protocol built on HTTP(S), JSON-RPC 2.0, and optional streaming, with machine-readable Agent Cards used to advertise capabilities, authentication requirements, supported modalities, and endpoint metadata (Ehtesham et al., 4 May 2025, Duan et al., 17 Aug 2025). In contemporary agent stacks, A2A is typically positioned as the horizontal coordination layer for agent-to-agent task exchange, distinct from tool-facing protocols such as the Model Context Protocol (MCP), and has been studied in enterprise orchestration, multimodal routing, edge computing, safety-critical assistants, and network control settings (Jeong, 2 Jun 2025, Li et al., 6 May 2025).

1. Architectural model and protocol scope

A2A is commonly framed around two symmetric roles: a client or delegating agent that issues tasks, and a remote or worker agent that fulfills them. One representative architecture places A2A between a multi-agent system layer and the underlying infrastructure, with a human or application issuing a high-level task, a client agent translating that intent into protocol messages, and a remote agent executing subtasks and returning responses (Duan et al., 17 Aug 2025). Other descriptions emphasize the same client/server decomposition while treating every participant as an autonomous peer that can both request and provide work (Ehtesham et al., 4 May 2025).

Discovery is mediated by Agent Cards published at well-known endpoints or registries. The literature describes both /.well-known/agent.json and /.well-known/agent-card.json, reflecting different implementations and documented versions; discovery may occur through direct HTTPS fetches, registry queries, or private API endpoints for authenticated clients (Duan et al., 17 Aug 2025, Srinivasan, 14 Apr 2026). Recent work also describes A2A versions such as v0.2, v0.2.1, and v0.3.0, indicating an evolving specification family rather than a single frozen wire image (Srinivasan, 14 Apr 2026, Malkapuram et al., 22 Sep 2025, Garigipati et al., 4 May 2026).

A recurring design motif is the reuse of existing web standards rather than a bespoke transport. Published descriptions consistently identify HTTP/HTTPS transport, JSON-RPC messaging, and Server-Sent Events (SSE) for streaming or long-running tasks, with some implementations also using webhooks or message queues (Duan et al., 17 Aug 2025, Habler et al., 23 Apr 2025). This “use-the-web” orientation reduces adoption friction, but it also imports the operational characteristics of web protocols, including header overhead, endpoint authentication complexity, and the need to externalize higher-level trust semantics.

2. Core protocol objects, message semantics, and task lifecycle

The Agent Card is the protocol’s primary self-description object. Across papers, its fields include identifiers and metadata such as id, name, version, endpoint URLs, authentication schemes, and capability or skill declarations. Several works also describe capability-level fields such as skillId, inputSchema, outputSchema, inputModes, outputModes, tags, and examples, making the Agent Card simultaneously a discovery artifact, interface contract, and routing hint (Habler et al., 23 Apr 2025, Li et al., 6 May 2025).

A2A messages are formalized in multiple, compatible ways. One formulation defines a message as a JSON object

m.header={sender_agent_id,recipient_agent_id,task_id,message_id,status,timestamp},m.body=payload,m.parts=[part1,part2,]m.header=\{sender\_agent\_id, recipient\_agent\_id, task\_id, message\_id, status, timestamp\},\quad m.body=\text{payload},\quad m.parts=[part_1,part_2,\dots]

with the set of allowable messages denoted by M={m1,m2,,mk}M=\{m_1,m_2,\dots,m_k\} (Jeong, 2 Jun 2025). Another defines

Message::=(role,parts),Part::=(mimeType,content),\mathrm{Message} ::= (\textit{role}, \textit{parts}), \qquad \mathrm{Part} ::= (\textit{mimeType}, \textit{content}),

and still another identifies three payload classes used in practice: TextPart, FilePart, and DataPart (Li et al., 6 May 2025, Stappen et al., 5 Feb 2026). These descriptions converge on a multipart, modality-aware message model rather than a pure text RPC.

Task execution is modeled as an explicit lifecycle. Different works give different state sets, including S={created, in-progress, completed, failed}S=\{created,\ in\text{-}progress,\ completed,\ failed\}, S={submitted,working,input-required,completed,failed,canceled,unknown}\mathcal{S}=\{\texttt{submitted},\texttt{working},\texttt{input-required},\texttt{completed},\texttt{failed},\texttt{canceled},\texttt{unknown}\}, and session-level states such as {Idle,Discovered,Authenticated,Exchanging,Completed,Error}\{Idle, Discovered, Authenticated, Exchanging, Completed, Error\} (Jeong, 2 Jun 2025, Li et al., 6 May 2025, Duan et al., 17 Aug 2025). The common feature is that A2A is task-oriented rather than statelessly request/response oriented: requests carry task identifiers, state updates may stream asynchronously, and completion may occur via HTTP response, SSE termination, or webhook push.

Published JSON-RPC method names also vary by implementation. The literature includes task.send, task.sendSubscribe, tasks.send, tasks/get, tasks/cancel, and message/send, indicating that method naming has not been completely uniform across described deployments and versions (Ehtesham et al., 4 May 2025, Habler et al., 23 Apr 2025, Garigipati et al., 4 May 2026). This suggests that protocol comprehension requires attention not only to conceptual A2A primitives but also to versioned endpoint conventions.

3. Relation to MCP and integrated agent-tool architectures

A2A is repeatedly contrasted with MCP. In the surveyed architectures, A2A provides inter-agent delegation, while MCP provides typed agent-to-tool or agent-to-resource I/O. One integrated framework places the two in adjacent layers: a user interface layer, an agent management layer, a core protocol layer containing both A2A and MCP runtimes, a tool integration layer for JSON-schema validation, and a security and authentication layer; each agent embeds both an A2A endpoint and an MCP client (Jeong, 2 Jun 2025). The conceptual separation is horizontal versus vertical interoperability: A2A connects agents to agents, while MCP connects agents to tools.

Three recurrent integration patterns are identified in the literature. In the first, an A2A worker embeds an internal MCP client and invokes tools behind the scenes; in the second, MCP tools are exposed directly as A2A skills; in the third, A2A tasks orchestrate multi-step workflows whose internal actions are realized through MCP tool chains (Li et al., 6 May 2025). These patterns provide increasing transparency to the caller, but they also expose a semantic interoperability problem: A2A skill descriptors are often natural-language and weakly structured, whereas MCP tool interfaces are typically grounded in explicit JSON Schema.

Empirical case studies show that this composition is operational rather than merely architectural. A LangGraph-based stock information system combining orchestrator and worker agents with MCP-backed finance, scraping, and database tools is reported with a codebase of approximately 300 lines and end-to-end latency of about 1.2 seconds per complex query (Jeong, 2 Jun 2025). In mobile core network control, A2A coordinates a Host Agent, Monitoring Agent, and Execution Agent, while MCP mediates tool discovery and invocation against service-based interfaces and system-level commands (Garigipati et al., 4 May 2026). The significance is that A2A is not primarily a tool protocol; its practical value emerges when it coordinates specialized agents that may themselves be tool-using composites.

At the same time, the literature is explicit that the A2A+MCP combination introduces new debugging, governance, and security problems rather than eliminating them. The combined stack amplifies semantic mismatches, expands discovery and execution attack surfaces, and complicates end-to-end tracing across multiple JSON-RPC channels (Li et al., 6 May 2025).

4. Capability advertisement, routing, and multimodal semantics

A central issue in A2A research is the meaning of capability advertisement. Several implementations treat capabilities as boolean declarations—an agent lists or does not list a skill—but recent work argues that this assumption is structurally inadequate. One formalization assigns each provider a true reliability ri[0,1]r_i\in[0,1] on a task while current A2A/MCP advertising exposes only a boolean ai{0,1}a_i\in\{0,1\}; under hidden quality, callers observe claims rather than competence, producing a “market for lemons” in which faith-based protocols admit only a low-trust pooling equilibrium (Mittal, 2 Jun 2026). The same work identifies failure modes that are especially relevant under boolean advertising: confident-wrong overclaim, drift, and misrouting. It proposes a Trust Layer above A2A and MCP with probabilistic capability descriptors, benchmark provenance, screening challenges or attestations, and reputation updates such as

repi(t+1)=αrepi(t)+(1α)1{success at t}.\mathit{rep}_i(t+1)=\alpha\,\mathit{rep}_i(t)+(1-\alpha)\,\mathbf{1}\{\text{success at }t\}.

A related misconception concerns multimodality. A2A already supports raw multimodal payloads through message parts and Agent Card mode declarations, yet many deployments collapse non-text inputs into text before forwarding. This “Text-Bottleneck” pattern converts speech through speech-to-text and images through captioning, even though downstream agents could accept raw audio or image FileParts (Srinivasan, 14 Apr 2026). An A2A extension called MMA2A exploits inputModes and outputModes to route voice, image, and text natively. On the CrossModal-CS benchmark, with the same LLM backend and only routing changed, MMA2A achieves 52% task completion accuracy versus 32% for the text-bottleneck baseline, with a 95% bootstrap confidence interval on Δ\DeltaTCA of M={m1,m2,,mk}M=\{m_1,m_2,\dots,m_k\}0 percentage points and McNemar’s exact M={m1,m2,,mk}M=\{m_1,m_2,\dots,m_k\}1; the gain disappears entirely under a keyword-matching ablation, where both pipelines obtain 36% (Srinivasan, 14 Apr 2026).

The combined implication is that capability metadata in A2A is not merely documentary. It affects planner routing, determines what information survives agent boundaries, and, when too coarse, can systematically degrade both trust and accuracy. Recent work therefore treats routing policy and capability semantics as first-order design variables rather than peripheral registry metadata (Srinivasan, 14 Apr 2026, Mittal, 2 Jun 2026).

5. Security, trust, and accountability

A2A’s baseline security model is usually described in terms of HTTPS, OAuth 2.0, JWTs, mutual TLS, scoped authorization, and replay-resistant task handling (Habler et al., 23 Apr 2025). However, multiple papers argue that endpoint authentication alone is insufficient. A comparative threat analysis identifies four protocol-specific A2A risk surfaces as Medium in likelihood and Medium in impact: absence of token lifetime limitations, insufficiently granular token scopes, rug pulls via dynamic capability shift, and shadowing attacks (Anbiaee et al., 11 Feb 2026). Secure-development guidance correspondingly recommends digital signatures on Agent Cards, strict JSON schema validation, nonce and timestamp freshness checks, rate limiting, mTLS or mutual OAuth flows, and audit logging tied to authenticated principals (Habler et al., 23 Apr 2025).

In safety-critical assistants, the security model becomes explicitly human-centric. For vehicle-embedded LLM assistants, A2A is treated as a structured protocol carrying TextPart, FilePart, and DataPart payloads between in-car and external agents; the principal concern is that A2A authenticates endpoints but does not semantically validate content, so a compromised external agent can inject natural-language payloads into the in-car assistant’s context as if they originated from the driver (Stappen et al., 5 Feb 2026). AgentHeLLM formalizes the ecosystem as a directed graph

M={m1,m2,,mk}M=\{m_1,m_2,\dots,m_k\}2

and distinguishes poison paths from trigger paths when analyzing harms such as driver distraction, unauthorized actuator control, false warnings, privacy loss, and economic damage (Stappen et al., 5 Feb 2026).

Sensitive-data handling exposes another limitation. One enhancement proposal introduces a USER_CONSENT_REQUIRED task state, ephemeral scoped tokens with lifetime

M={m1,m2,,mk}M=\{m_1,m_2,\dots,m_k\}3

and direct user-to-service channels that bypass intermediate agents for highly sensitive payloads (Louck et al., 18 May 2025). In prompt-injection tests using nine adversarial prompts and five runs per prompt, the baseline agent leaked simulated secrets at rates between 60% and 100%, whereas the enhanced configuration achieved 0% leak rate across 45 attempts, with worst-case latency overhead reported as +0.5 seconds (Louck et al., 18 May 2025).

Accountability is further distinguished from authentication in consent-oriented work. Anumati’s Agent Consent and Adherence Protocol (ACAP) adds three append-only primitives—PolicyDocument, ConsentRecord, and AdherenceEvent—to A2A, with TLA+-specified safety properties such as NoSkillWithoutConsent, AdherenceAnchored, and NoSkillOnCapabilityDrift (Kadaboina, 16 Apr 2026). This addresses a specific gap: OAuth or mTLS can establish who may call a skill, but not whether the caller understood and adhered to the callee’s policy conditions. In the reference implementation, chain and trail validation are reported in microseconds, well under 1% of a 20–100 ms A2A round trip (Kadaboina, 16 Apr 2026).

6. Operational performance and deployment environments

A2A’s operational profile depends strongly on deployment context. In edge-computing analysis, A2A’s average per-message overhead is estimated at 1–2 KB, with simple request/response latency on the order of 100–200 ms under 10 Mbps wireless links and 20 ms RTT; registry discovery round trips of 50–100 ms can dominate short tasks, and frequent SSE updates may saturate narrowband links such as 5 MHz LTE channels (Duan et al., 17 Aug 2025). The same study concludes that A2A’s reliance on JSON, HTTP, and SSE provides heterogeneity and dynamicity advantages, but leaves unresolved gaps in resource-aware discovery, mobility handover, and lightweight serialization for constrained edge devices.

In network-control experiments, the protocol overhead of A2A itself is reported as small relative to LLM execution time. A mobile core prototype using a Host Agent, Monitoring Agent, and Execution Agent measures mean latencies of 2.35 seconds for Host-Agent reasoning and delegation, 4.50 seconds for the Monitoring Agent, 4.99 seconds for the Execution Agent, and approximately 0.98 seconds for aggregate A2A card retrieval and delegations, yielding mean end-to-end latency of 12.81 seconds with standard deviation 1.39 seconds (Garigipati et al., 4 May 2026). The paper states that protocol overhead is negligible relative to LLM tool-selection and synthesis stages, and that the dominant source of variability is inference rather than message marshalling (Garigipati et al., 4 May 2026).

Enterprise deployments reveal an additional distinction between protocol compliance and practical interoperability. A Cloud Run A2A Hub for Gemini Enterprise routes among public and IAM-protected downstream A2A agents, a retrieval path, and a general QA path, but must enforce a text-only JSON-RPC compatibility mode because the Gemini Enterprise UI sends acceptedOutputModes=[] and may fail if structured JSON parts are returned (Morita, 26 Jan 2026). On a four-query benchmark covering expense policy, project management, general knowledge, and incident response deadline extraction, the system reports deterministic routing accuracy of 4/4 correct, 0 UI crashes across 20 repeat runs per query, and stable cross-boundary authentication once the correct OIDC audience and invoker permissions are configured (Morita, 26 Jan 2026). This demonstrates that usable A2A systems may require boundary-aware middleware even when upstream and downstream components are individually protocol-conformant.

7. Governance, provenance, and economic extensions

A substantial body of work extends A2A beyond task delegation into identity, provenance, and payment infrastructure. One proposal anchors Agent Cards on distributed ledgers as smart contracts and introduces x402 micropayments at the HTTP layer using 402 Payment Required, X-PAYMENT, and X-PAYMENT-RESPONSE headers (Vaziry et al., 24 Jul 2025). In the reported prototype, average HTTP request latency is 150 ms, on-chain settlement is 1.8 seconds on Sepolia, throughput reaches 200 requests per second with 10 workers, gas per payment transaction is approximately 45,000, and estimated cost per transaction is about \$0.03 (Vaziry et al., 24 Jul 2025). These extensions are presented as a way to make discoverability and compensation first-class protocol concerns rather than external business logic.

A separate line of work addresses non-human identity provenance. “Context Lineage Assurance for Non-Human Identities in Critical Multi-Agent Systems” augments A2A with append-only Merkle logs, signed tree heads, and federated proof servers so that agents and external auditors can verify multi-hop call provenance rather than only the immediate transport session (Malkapuram et al., 22 Sep 2025). It defines leaf hashes

M={m1,m2,,mk}M=\{m_1,m_2,\dots,m_k\}4

and signed tree heads

M={m1,m2,,mk}M=\{m_1,m_2,\dots,m_k\}5

and extends the Agent Card with fields such as agent_id, public_key, identity_proof, and lineage_support (Malkapuram et al., 22 Sep 2025). The stated goal is cryptographic validation of full call chains in regulated settings such as FedRAMP.

Related work extends the broader A2A paradigm into legally and economically binding transactions. ATCP/IP proposes a trustless agent-to-agent transaction layer for intellectual-property exchange on the Story blockchain, with programmable licenses, on-chain audit trails, royalty routing, and off-chain legal wrappers (Muttoni et al., 8 Jan 2025). Although this is not a core A2A wire-format proposal, it illustrates a broader trajectory in which agent communication protocols are coupled to negotiation, settlement, and governance mechanisms.

Taken together, these extensions indicate that A2A is evolving from a transport and delegation substrate into a locus for higher-order protocol concerns: truthful capability signaling, consent and adherence, provenance, payment, and regulated-identity assurance. A plausible implication is that future A2A deployments will be differentiated less by basic JSON-RPC interoperability than by the trust, audit, and economic layers they adopt above the core messaging plane.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Agent2Agent (A2A).