Comprehensive Agentic Data Proxy

Updated 4 July 2026

The topic defines a unified intermediary layer that mediates heterogeneous data access and supports dynamic, agent-driven multi-step reasoning.
It integrates retrieval, evidence synthesis, provenance tracking, and privacy-preserving execution using adaptive control loops and agent feedback.
Its architecture spans various families—single-agent, multi-agent, hierarchical, and corrective—to optimize performance, governance, and learning in real-time.

Searching arXiv for the cited papers to ground the article in current literature. {"query":"(Singh et al., 15 Jan 2025) Agentic Retrieval-Augmented Generation survey Agentic RAG", "max_results": 5} {"query":"(Giurgiu et al., 10 Dec 2025) Supporting Dynamic Agentic Workloads How Data and Agents Interact", "max_results": 5} {"query":"(Souza et al., 4 Aug 2025) PROV-AGENT Unified Provenance for Tracking AI Agent Interactions in Agentic Workflows", "max_results": 5} {"query":"(Stanley et al., 21 Apr 2026) Guaranteed Accounting for Agent Privacy execution environment for AI agents", "max_results": 5} {"query":"(Sun et al., 7 Aug 2025) AgenticData heterogeneous data", "max_results": 5} {"query":"(Chuang et al., 18 Feb 2026) Proxy State-Based Evaluation multi-turn tool-calling LLM agents", "max_results": 5} {"query":"(Goswami, 16 Sep 2025) Agentic JWT secure delegation autonomous AI agents", "max_results": 5} {"query":"(Song et al., 2 Jul 2026) PACE Proxy for Agentic Capability Evaluation", "max_results": 5} {"query":"(Wang et al., 5 Aug 2025) MI9 Agent Intelligence Protocol Runtime Governance for Agentic AI Systems", "max_results": 5} {"query":"(Qi et al., 29 May 2026) DynaTree Dynamic Agentic Retrieval Tree time-sensitive news retrieval", "max_results": 5} {"query":"(Shi et al., 5 Feb 2026) AgentXRay White-Boxing Agentic Systems via Workflow Reconstruction", "max_results": 5} {"query":"(Martinon et al., 29 May 2026) GLIDE Prediction-Powered Inference reliable GenAI agentic systems evaluation", "max_results": 5} A Comprehensive Agentic Data Proxy is a unified intermediary layer that sits between users or agentic applications and heterogeneous, dynamic data sources or execution environments. In the Agentic RAG literature, it is defined as a system of autonomous AI agents that mediate access to vector indexes, SQL and NoSQL databases, web and API endpoints, and knowledge graphs while performing retrieval, multi-step reasoning, evidence synthesis, validation or critique, citation or provenance, and adaptive workflow orchestration through planning, reflection, tool use, and multi-agent collaboration (Singh et al., 15 Jan 2025). Related work broadens the same idea into an operational core for dynamic multi-agent workloads, a provenance runtime layer, a privacy-preserving execution environment, and an evaluation substrate, so the notion now spans data access, control, governance, and learning rather than retrieval alone (Giurgiu et al., 10 Dec 2025, Souza et al., 4 Aug 2025, Stanley et al., 21 Apr 2026).

1. Conceptual scope and historical development

Traditional Retrieval-Augmented Generation is described as a linear, static pipeline—retrieval, augmentation, then generation—and its reported limitations include weak contextual integration, limited multi-hop reasoning, and scalability or latency issues (Singh et al., 15 Jan 2025). Agentic RAG reframes that pipeline by embedding autonomous agents that dynamically plan, reflect, select tools, route queries to sources, iterate retrieval, critique outputs, and adapt the workflow. The Comprehensive Agentic Data Proxy is the synthesis of these patterns into a context-aware, scalable mediation layer for dynamic environments (Singh et al., 15 Jan 2025).

A complementary shift appears in the Agent-Centric Data Fabric literature, which argues that agentic systems generate dynamic, context-driven, collaborative, non-deterministic, and multi-modal workloads that strain conventional query optimizers and caching mechanisms. In that view, the proxy optimizes behaviors instead of queries, treats context, intent, and intermediate artifacts as first-class signals, and learns from agent feedback to minimize redundant queries, data movement, and inference load (Giurgiu et al., 10 Dec 2025). This behavior-first framing moves the proxy away from conventional middleware and closer to an adaptive collaborator.

The concept also expands beyond retrieval. In online reinforcement-learning systems for deployed agents, the proxy is defined as enterprise middleware that intercepts agent decisions at stable execution boundaries, converts them into standardized, step-granular trajectories with learning signals, enforces privacy and compliance, and persists replayable experience for training, off-policy evaluation, and automated evolution decisions (Yan et al., 1 Jul 2026). Taken together, these strands suggest that a comprehensive proxy is not merely a retriever front-end; it is a runtime substrate for mediation, observability, and controlled adaptation.

2. Architectural families and control loops

The most explicit taxonomy comes from Agentic RAG, which organizes systems into several recurrent architectural families (Singh et al., 15 Jan 2025).

Architecture family	Distinguishing mechanism	Proxy role
Single-Agent Router	One agent performs query analysis, source routing, and synthesis	Simpler systems with limited tools
Multi-Agent Systems	Coordinator plus specialized retrieval agents	Parallel retrieval and specialization
Hierarchical Agentic RAG	Top-tier strategic controller with delegated lower tiers	Reliability, cost, and source prioritization
Corrective RAG	Relevance evaluation, query refinement, external retrieval, iterative correction	Retrieval repair and critique
Adaptive RAG	Classifier selects no-retrieval, single-step, or multi-step pathway	Complexity-aware routing
Graph-based Agentic RAG	Graph KB traversal with critic and feedback loops	Multi-hop relational reasoning
Agentic Document Workflows	Stateful document-centric orchestration	Structured outputs and provenance tracking

These families map cleanly onto proxy functions. Query understanding can be handled by a router agent, hierarchical top-tier planning, an adaptive classifier, or a multi-agent planner. Retrieval strategy adaptation appears as Adaptive RAG pathway selection, CRAG-style query refinement, graph expansion in Agent-G or GeAR, or hierarchical source prioritization. Validation and critique recur as reflection loops, critic roles, relevance evaluation, and monitor agents (Singh et al., 15 Jan 2025).

The corresponding end-to-end pipeline is explicitly stepwise: intent parsing, planning, tool selection, retrieval, reranking, synthesis, reflection or critique, finalization with citations or provenance, and logging or telemetry (Singh et al., 15 Jan 2025). ReAct-style think–act loops are used when reasoning and tool calls must interleave; planner–executor–critic patterns are used for complex multi-step tasks; and negotiation or arbitration patterns are used when specialized agents disagree or when graph, multimodal, and text evidence must be merged (Singh et al., 15 Jan 2025).

Control loops are stateful. Short-term conversation state, long-term memory, prompt scratchpads, tool-call transcripts, and intermediate results are maintained in LangGraph-like state graphs with loops and persistence (Singh et al., 15 Jan 2025). In the data-fabric setting, agent-local micro-caches, shared semantic caches, predictive prefetchers, and quorum-based serving coordinators make those loops explicitly cross-agent and cross-engine (Giurgiu et al., 10 Dec 2025). A specialized variant is "DynaTree" (Qi et al., 29 May 2026), which decouples offline agentic semantic expansion from online retrieval decisions by materializing a reusable retrieval tree and then selecting a daily subtree through a time-localized evaluation proxy. This removes online agent execution, tree modification, and retraining while preserving adaptation to a changing corpus (Qi et al., 29 May 2026).

3. Data models, interfaces, and formal objectives

The data proxy literature increasingly specifies the request, state, and storage abstractions that agents expose. In the Agent-Centric Data Fabric, contextual requests carry fields such as agent_id, task_id, intent_text, context_embedding, tool_usage_metadata, modality descriptors, constraints, and intermediate artifacts, plus attention hints, federation descriptors, and provenance or lineage signals (Giurgiu et al., 10 Dec 2025). The storage plane includes vector indices for queries, contexts, schemas, partitions, and artifacts; key-value micro-caches keyed by semantic hash plus embedding; a shared semantic cache of reusable embeddings and verified results; optional knowledge-graph links; document stores; and connectors to databases, APIs, streams, and inference endpoints (Giurgiu et al., 10 Dec 2025).

A mathematically explicit formulation appears in several places. Agentic RAG defines a retrieval set $S_k = R_\theta(q, D, k)$ and generation $y = G_\phi(q, S_k)$ under a utility objective with cost:

$\max_{k,\;\pi} \;\mathbb{E}[U(y, S_k)] - \lambda\, C(k)$

subject to latency $\le T_{\max}$ and budget $\le B$ (Singh et al., 15 Jan 2025). The data-fabric work defines attention-guided retrieval through embedding-based soft attention,

$\alpha_i = \frac{\exp\big( (q\cdot k_i)/\sqrt{d} \big)}{\sum_j \exp\big( (q\cdot k_j)/\sqrt{d} \big)},$

and semantic micro-cache admission through a utility score

$U(d) = w_r\,R(d) + w_f\,F(d) + w_t\,e^{-\lambda \Delta t(d)} - w_c\,C(d),$

thereby making routing, caching, and prefetching explicit optimization targets rather than heuristic side effects (Giurgiu et al., 10 Dec 2025).

Protocol standardization is emerging at two levels. For learning-oriented deployments, the Agent Trajectory Data Protocol records step-granular observation, hidden_state, action, outcome, reward, termination, policy metadata, governance fields, and lineage, so replay, off-policy evaluation, and evolution control are possible across heterogeneous agent paradigms (Yan et al., 1 Jul 2026). For inter-agent communication, the Agent Network Protocol proposes a three-layer stack: a DID-based identity and encrypted communication layer, a meta-protocol negotiation layer, and an application layer centered on the JSON-LD Agent Description Protocol and .well-known/agent-descriptions discovery (Chang et al., 18 Jul 2025). This suggests a broader interoperability role for data proxies: not only brokering data access, but also brokering identity, capability discovery, and protocol negotiation across the Agentic Web.

4. Provenance, privacy, authorization, and runtime governance

A defining property of a comprehensive proxy is that it makes agentic behavior traceable. "PROV-AGENT" (Souza et al., 4 Aug 2025) extends W3C PROV so that AIAgent, AgentTool, AIModelInvocation, Prompt, ResponseData, and related entities become first-class provenance objects linked to workflow tasks, data artifacts, telemetry, scheduling data, and execution locations. The resulting provenance graph supports decision-chain queries, conflict detection, downstream impact analysis, and cross-facility lineage across edge, cloud, and HPC environments (Souza et al., 4 Aug 2025). This turns the proxy into a queryable explanation layer rather than a mere logging sink.

Runtime governance introduces a second control plane. "MI9 -- Agent Intelligence Protocol" (Wang et al., 5 Aug 2025) defines an Agency-Risk Index,

$\text{ARI} = \frac{1}{3}\sum_{d=1}^{3}\left(\frac{1}{12}\sum_{c=1}^{4} s_{d,c}\right),$

continuous authorization monitoring, an Agentic Telemetry Schema, FSM-based real-time conformance checking, goal-conditioned drift detection via Jensen–Shannon divergence and Mann–Whitney tests, and graduated containment from monitoring to execution isolation (Wang et al., 5 Aug 2025). On 1,033 synthetic scenarios, MI9 reports a detection rate of 99.81%, a false positive rate of 0.0121%, and a risk coverage rate of 94.41% (Wang et al., 5 Aug 2025). In proxy form, these mechanisms provide in-session governance over cognitive, action, and coordination events.

Security-sensitive deployments add cryptographic intent binding. "Agentic JWT" (Goswami, 16 Sep 2025) binds each request to user-approved intent, workflow step, agent identity checksum, chained delegation assertions, and proof-of-possession keys, with a gateway validating signature, step binding, delegation chain, scope, and replay constraints. Its proof-of-concept reports functional blocking of scope-violating requests, replay, impersonation, and prompt-injection pathways with sub-millisecond overhead on commodity hardware (Goswami, 16 Sep 2025). This is complementary to data mediation: the proxy becomes both a data path and an authorization enforcement point.

Privacy-preserving execution pushes the model further. GAAP enforces confidentiality of private user data deterministically by interposing on all model and tool calls, applying information-flow control, consulting a Permission DB keyed by private-data item and external party, and maintaining a Disclosure Log and annotation framework across tasks (Stanley et al., 21 Apr 2026). In the reported evaluation, GAAP blocks all tested data disclosure attacks with 0% attack success, while retaining 76.0% utility on the custom 20-task suite versus 81.0% for the non-private baseline (Stanley et al., 21 Apr 2026). This establishes a strong interpretation of the proxy: a controlled execution environment that does not trust the agent, model, or provider with private data unless the user has explicitly authorized that disclosure.

5. Evaluation proxies, statistical estimation, and performance optimization

The literature also uses the proxy concept for evaluation itself. Agentic RAG surveys standardize evaluation along information-retrieval metrics such as Precision, Recall, and NDCG; QA and reasoning metrics such as factual consistency, groundedness, answer quality, and multi-hop performance; and system metrics such as latency, throughput, and cost (Singh et al., 15 Jan 2025). Those metrics are increasingly supplemented by explicit proxy mechanisms that stand in for expensive or unavailable ground truth.

"Toward Scalable Verifiable Reward" (Chuang et al., 18 Feb 2026) introduces proxy state-based evaluation, in which an LLM state tracker reconstructs a structured proxy backend state from a full interaction trace and judges verify goal completion and hallucinations against scenario constraints. The framework reports stable, model-differentiating rankings, near-zero simulator hallucination rates under careful scenario specification, and human–LLM judge agreement exceeding 90% (Chuang et al., 18 Feb 2026). "PACE: A Proxy for Agentic Capability Evaluation" (Song et al., 2 Jul 2026) instead predicts expensive benchmark scores from a compact subset of atomic non-agentic instances. Across 14 models, 4 target agentic benchmarks, and 19 source benchmarks, PACE-Bench reports leave-one-out mean absolute error under 4%, Spearman correlation above 0.80, pairwise model-ranking accuracy around 85%, and cost under 1% of full agentic evaluation (Song et al., 2 Jul 2026). These systems do not proxy data serving; they proxy agentic capability measurement.

When proxy labels are biased, statistical debiasing becomes necessary. GLIDE packages prediction-powered inference estimators and samplers so a small set of human annotations can be combined with proxy predictions to yield unbiased estimates with valid confidence intervals (Martinon et al., 29 May 2026). Its agentic evaluation case study on R-Judge reports interval-width reductions of about 16% to 20% at fixed labeling budget, with effective sample sizes increasing to approximately 143, 148, and 157 depending on the estimator (Martinon et al., 29 May 2026). This line of work suggests that a mature agentic data proxy should not only collect evaluation signals but also expose uncertainty-aware, debiased estimates for deployment gating and regression tracking.

Optimization remains central on the serving side. The data-fabric literature emphasizes adaptive $k$ , selective retrieval, ANN tuning, prefetching, batching, asynchronous pipelines, budget-aware tool calls, telemetry, and drift detection (Giurgiu et al., 10 Dec 2025). AgenticData adds a semantic plan cost model,

$C(P) = \sum_{o \in \text{sem-ops}(P)} \text{Card}(o)\cdot\left(|I_o|\cdot \text{Fee}_{in} + |O_o|\cdot \text{Fee}_{out}\right),$

together with rule-based optimization, dynamic-programming join ordering, and quality-aware LLM selection for heterogeneous analytics over structured and unstructured data (Sun et al., 7 Aug 2025).

6. Application domains, specialized variants, and open problems

Application reports show that the proxy abstraction is already broad. The Agentic RAG survey lists customer support, healthcare, legal, finance, education, and graph-enhanced multimodal reporting as representative settings, including Twitch Ad Sales on Bedrock, Patient Case Summary with LlamaCloud ADW, Contract Review, Auto Insurance Claims, Research Paper Report generation, and market survey generation with GeAR or Agent-G (Singh et al., 15 Jan 2025). These examples all use the proxy to route across APIs, internal knowledge bases, databases, or graph stores while preserving citations, provenance, or compliance.

Several specialized systems make the abstraction concrete. DynaTree targets time-sensitive news retrieval and reports Syft News average Recall@100 of 0.475 and average NDCG@10 of 0.757, with online production survival improving from an average of 0.45 for a fixed subtree to 0.67 for daily subtree selection (Qi et al., 29 May 2026). AgenticData translates natural-language questions into semantic plans over heterogeneous sources and reports 94.44% Easy and 50.79% Hard accuracy on DABStep, 44.5% on Spider-2.0-Lite, and 95% on the Wikipedia benchmark (Sun et al., 7 Aug 2025). OpenThoughts-Agent moves from serving proxies to training-data proxies, assembling a 100K-trajectory dataset whose Qwen3-32B fine-tune reaches an average of 44.8% across seven agentic benchmarks, improving over Nemotron-Terminal-32B at 40.9% (Raoof et al., 23 Jun 2026). AgentXRay applies the proxy concept to interpretability, reconstructing white-box workflows for black-box agents and reporting average SFE 0.426, better than AFlow at 0.339, while reducing token usage by 8–22% (Shi et al., 5 Feb 2026).

The remaining limitations are consistent across papers. Hallucination persists even with retrieval; sources may conflict or become stale; tools can be unreliable; long-horizon planning remains difficult; and cost or latency control, evaluation of agentic behaviors, multimodal cost modeling, cache coherence, quorum calibration, and governance under drift remain open (Singh et al., 15 Jan 2025, Giurgiu et al., 10 Dec 2025, Yan et al., 1 Jul 2026). A plausible implication is that the comprehensive proxy will remain a composite system rather than a single component: data mediation, provenance, governance, privacy, evaluation, and learning are being specified in separate subliteratures, and current research is gradually welding them into a unified runtime substrate.