Agent-as-a-Graph Retrieval

Updated 29 November 2025

Agent-as-a-Graph Retrieval is a paradigm that models agents and tools as interconnected graph nodes with explicit relationships.
It integrates semantic vector search with graph structural reasoning and tailored ranking to improve agent selection and tool orchestration.
Empirical results demonstrate significant gains in Recall@5 and nDCG@5, highlighting robust, context-aware retrieval performance.

Agent-as-a-Graph Retrieval is a retrieval-augmented paradigm for multi-agent LLM systems wherein agents, subagents, or their discrete capabilities (e.g., tools, protocols) are explicitly modeled as nodes and edges in a knowledge graph. This approach enables joint, fine-grained retrieval over compositional agent/tool bundles by integrating semantic vector search with graph structural reasoning and tailored ranking strategies. By leveraging the explicitly structured relationships among agents, tools, subcomponents, and their capabilities, agent-as-a-graph retrieval supports substantially improved agent selection, tool orchestration, and robust, context-aware reasoning in complex LLM multi-agent environments (Nizar et al., 22 Nov 2025).

1. Knowledge Graph Modeling of Agent Systems

Agent-as-a-Graph retrieval formalizes the system of agents and their toolkits as a bipartite knowledge graph: $G = (V, E, \mathcal{T})$ where $V = \mathcal{A} \cup \mathcal{T}$ , with $\mathcal{A}$ the set of parent agent nodes and $\mathcal{T}$ the set of tool nodes. Edges $E \subseteq \mathcal{T} \times \mathcal{A}$ capture “ownership” or association—formally, a tool $t$ is owned by agent $a$ if and only if $(t, a) \in E$ .

Each node $v \in V$ carries:

Textual name and description ( $\text{text}(v)$ ),
Precomputed embedding $\text{Emb}(v) \in \mathbb{R}^d$ ,
Optional metadata (capability vectors, access-policy tags).

Node types are encoded with $\mathcal{T}(v) \in \{\text{Agent}, \text{Tool}\}$ , and only text–vector mappings are indexed for retrieval. This bipartite construction makes agent–tool relationships first-class citizens in retrieval, enabling traversal and ranking at multiple levels of granularity (agent or tool).

2. Retrieval Pipeline Composition and Algorithms

The retrieval process for agent selection proceeds in three modular stages:

Vector Search: The system retrieves the top-N relevant agents and tools independently by encoding the query and each node into the same embedding space and computing cosine similarity. For a query $q$ ,

$\text{sim}(q, v) = \frac{\langle q_\text{emb}, v_\text{emb} \rangle}{\| q_\text{emb} \| \cdot \| v_\text{emb} \|}$

Top-N candidates are collected from both the tool and agent corpora.

Type-Specific Weighted Reciprocal Rank Fusion (wRRF): Retrieved candidates are merged and reranked to balance agent-level and tool-level relevance. Each element $e$ in the joint candidate list $L$ receives a score: $s_\text{wRRF}(e) = \begin{cases} \alpha_\text{T} / (k + r(e)), & \text{if } e \in C_T \ \alpha_\text{A} / (k + r(e)), & \text{if } e \in C_A \end{cases}$ where $r(e)$ is the global rank, $k$ is a damper (e.g., 60), and $\alpha_{\text{A}}, \alpha_{\text{T}}$ are type weights. Choices for these parameters modulate the fusion between fine-grained (tool) and coarse-grained (agent) matches. Empirical ablation identifies optimal performance at $(\alpha_\text{A}, \alpha_\text{T}) = (1.5, 1.0)$ (Nizar et al., 22 Nov 2025).
Graph Traversal and Agent Collapsing: After reranking, the system traverses each candidate: directly selecting agent nodes, or for a tool node, promoting its parent agent via the graph edge. Unique parent agents are collected until K are assembled. This process ensures that fine-grained tool hits are mapped to actionable agent candidates, preserving “bundle context” for downstream orchestration.

High-Level Pseudocode

Input: query q; Parameters: N≫K, α_A, α_T, k; Corpora: C_A (agents), C_T (tools)
1. Vector Search:
   L_T ← TopN_by_cosine(q, C_T, N)
   L_A ← TopN_by_cosine(q, C_A, N)
   L ← merge(L_T, L_A)

2. wRRF Reranking:
   For e ∈ L:
     if NodeType(e)==Tool: score(e) ← α_T / (k + r(e))
     else:                score(e) ← α_A / (k + r(e))
   Sort L by score descending

3. Graph Traversal:
   A_star ← empty list
   For e in L:
     if NodeType(e)==Agent: a ← e
     else:                  a ← owner(e)
     if a ∉ A_star:
       append a to A_star
       if |A_star| == K: break
   Return A_star

(Nizar et al., 22 Nov 2025)

3. Empirical Benchmarking and Robustness

Agent-as-a-Graph was evaluated on the LiveMCPBench, a benchmark comprising 70 Model Context Protocol (MCP) servers (“agents”), 527 tools, and 95 real-world multistep questions (mean 2.68 steps/question, 2.82 tools, 1.40 agents). Metrics include Recall@K, mean Average Precision (mAP@K), and nDCG@K.

Method	Recall@5	nDCG@5
Agent-as-a-Graph	0.85	0.47
ScaleMCP (SOTA base)	0.74	0.41

Relative improvements: +14.9% Recall@5, +14.6% nDCG@5 over ScaleMCP. Integration of wRRF yields +2.4% Recall@5 over unweighted RRF. Robustness is confirmed by replicating ~19.4% Recall@5 gains across eight distinct embedding models.

Ablation on fusion weights demonstrates that over-emphasizing either tool- or agent-level rescores degrades overall performance, confirming the necessity of calibrated bipartite candidate fusion (Nizar et al., 22 Nov 2025).

Agent-as-a-Graph retrieval extends beyond flat selection based on agent descriptions or homogenous RAG retrieval. Compared to:

Single-agent matching: Loses tool granularity, leading to inferior context alignment.
GraphRAG/Graph-based RAG: Most methods prior to Agent-as-a-Graph either aggregate over nodes without hierarchical promotion or lack type-level fusion. Agent-as-a-Graph’s bipartite model and type-aware reranking enable more precise orchestration in multi-agent and tool-manifold settings.
Previous SOTA Retrievers (e.g., ScaleMCP, MCPZero, RRF-based): These do not explicitly map tools back to agents through knowledge graph structure and either normalize only at the agent level or ignore fine-grained tool hits.

In broader GraphRAG literature, agent-graph representations have been explored in diverse settings, including persona-based systems with heterogenous graphs (Liang et al., 21 Nov 2025), adaptive exploration in Graph Counselor’s tri-agent model (Gao et al., 4 Jun 2025), multi-tool KGQA with iterative LLM/tool pipelines (Mavromatis et al., 5 Jul 2025), and explicit path-constraint retrieval for logical chain integrity (Oladokun, 23 Nov 2025). These variants share the principle of harnessing graph topology to tie granular retrieval units to agentic or workflow-level reasoning.

5. Key Limitations and Future Research Directions

Notable current limitations include:

Graph is strictly bipartite (tool → agent); the model does not account for inter-tool, inter-agent, or higher-order relationships.
Static, globally fixed fusion weights ( $\alpha_\text{A}, \alpha_\text{T}$ ) may misalign with query intent; adaptation on a per-query basis is lacking.
All embeddings must be precomputed; cold-start for novel tools/agents requires a re-indexing cycle.

Future work directions proposed:

Query-adaptive type weighting, either by meta-learning or by prompting LLMs to recommend fusion strategies.
Incorporating richer edge types, allowing reasoning over co-occurrence or hierarchical agent organization.
Hybrid retrieval via integration of dense semantic scores with graph-theoretic signals such as centrality or minimal path length.
Incremental, dynamic graph updates to accommodate evolving tool/agent catalogs without necessitating a global index rebuild.

These directions would enable more flexible, efficient, and context-sensitive retrieval strategies for dynamically composed and vastly scaled agent systems (Nizar et al., 22 Nov 2025).

6. Impact on Multi-Agent LLM System Design

Agent-as-a-Graph retrieval offers direct improvements in system modularity, compositionality, and reasoning depth for large-scale, distributed LLM agent ecosystems. By embedding the retrieval function in a knowledge graph that jointly encodes agent and tool topology, the approach supports:

Aggregation of fine-grained capabilities within agent bundles for better tool match and workflow assembly.
Transparent and controllable agent selection, with explicit mapping of tool-level evidence to actionable agent invocations.
Robustness across embedding backends, benchmarks, and agent configurations, enabling broad platform applicability.

This retrieval formalism marks a significant shift from monolithic or sequential agent selection strategies, embedding the very notion of agent–tool compositionality and context into the retrieval substrate itself. This architectural advance underpins scalable, transparent, and high-recall routing in increasingly complex LLM multi-agent deployments (Nizar et al., 22 Nov 2025).