Trade-offs in Decentralized Agentic AI Discovery Across the Compute Continuum

Published 12 May 2026 in cs.DC and cs.AI | (2605.11839v1)

Abstract: Agentic systems deployed across the compute continuum need discovery mechanisms that remain effective across cloud, edge, and intermittently connected domains. In some emerging agentic architectures, decentralized discovery is already an active design direction, placing DHT-based lookup on the path toward agent directories. This paper studies the trade-offs among major structured-overlay families for agent discovery, comparing Chord, Pastry, and Kademlia as candidate indexing substrates within a shared control-plane framework. Using a benchmark subset centered on a 4096-node stationary comparison and a representative 4096-node churn benchmark, the paper characterizes how discovery reliability, startup behavior, and control-plane overhead vary across these overlays. The goal is to clarify the operating points they expose for agent discovery across edge-to-cloud environments.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper demonstrates that a log2 N warmup window significantly improves discovery success, cutting down on message overhead and latency.
The evaluation benchmarks Chord, Pastry, and Kademlia overlays, highlighting Pastry’s low operational cost and Kademlia’s superior tail latency.
The study underscores key design trade-offs in decentralized discovery, emphasizing the importance of stabilizing overlay networks under churn.

Trade-offs in Decentralized Agentic AI Discovery Across the Compute Continuum

Motivation and Architectural Context

The proliferation of agentic systems spanning cloud, edge, and intermittently connected domains necessitates robust, scalable, and efficient discovery mechanisms that transcend classical centralized registries. In emerging agentic AI architectures—such as AGNTCY—the agent directory transforms into operational infrastructure, directly mediating orchestration, scalability, and cross-tier coordination. Within such frameworks, decentralized discovery, typically realized via distributed hash table (DHT) overlays, becomes a primary control-plane function, where descriptors are published, queries are issued, candidates are retrieved, and orchestration proceeds, as detailed in the conceptual workflow.

Figure 1: Conceptual workflow wherein agent discovery is handled as a control-plane operation, decoupling publication, retrieval, and invocation for maximal scalability.

This architectural shift places particular emphasis on the suitability of DHT overlays—Chord, Pastry, and Kademlia—for agent discovery, not merely via asymptotic complexity but in terms of startup behavior, reliability, tail latency, and communication overhead under heterogeneous scale and membership dynamics.

Evaluation Methodology

The paper adopts a service-level perspective rooted in ADS-centric agent directory abstractions. Evaluations are conducted within a tightly controlled workload: 4096 nodes, 50 skill catalog, uniform publication per agent, replication level three, query objective skill_05, and standardized lookup, publication, and republish semantics. Operator-facing metrics include:

Discovery success and recall: Immediate operational correctness.
Observed messages/query: Aggregate control-plane communication cost during live operation.
Query-only GET messages/query: Isolated lookup traffic excluding maintenance overhead.
P95 latency: Proxy for routing efficiency, emphasizing tail behavior.
Robustness under churn: Stress resilience in face of session dynamics.

Benchmarks encompass two regimes: a stationary comparison at scale (N=4096, immediate vs. log2 N warmup startup) and a representative churn scenario (exponential session churn, same-ID rejoins).

Stationary Regime and Startup Effects

Issuing queries immediately after bootstrap, all overlays suffer substantial cold-start penalties. Discovery success hovers at 0.60–0.64, P95 latency is elevated (15–30 units), and observed messages/query reach 10–70k, largely due to concurrent publication and republish activity interfering with lookup traffic. The cold-start interference is not trivial; it directly impacts operator-visible effectiveness and cost.

A brief warmup (log2 N units) neutralizes the cold-start penalty: discovery success converges to 1.0, recall stabilizes, tail latency drops below 8 units, and observed communication cost compresses to 3.6–12.2k messages/query. Query-only GETs are markedly lower (8–16 per query), underscoring that the bulk of control-plane overhead in cold start arises from maintenance traffic.

Figure 2: Warming up the overlay (log2 N startup) rapidly restores full success and minimizes message cost and query latency across protocols.

Comparing overlays, Pastry consistently achieves the lowest post-bootstrap cost and latency, Kademlia excels in tail latency but with pronounced communication overhead, while Chord sits as a higher-cost, intermediate operating point.

Churn Regime and Robustness

Under representative churn (session mean 100, downtime mean 30), all overlays retain perfect success (1.0) and precision, with recall nearly identical. Pastry offers P95 latency of 5 units and 5.8k messages/query; Chord and Kademlia both incur nearly double the traffic overhead, with Kademlia achieving the lowest tail latency (4.6 units).

Figure 3: Churn benchmark results confirm all overlays maintain discovery correctness, with Pastry retaining lowest communication cost and Kademlia favorable in tail latency.

The churn benchmark validates that post-bootstrap rankings hold under persistent membership changes. Full-scale deployments employing a brief stabilization window safeguard correctness, leaving operational cost and latency as primary optimization axes.

Synthesis and Practical Deployment Guidance

Empirical results substantiate a delineated trade-off landscape:

Pastry: Minimizes control-plane overhead post-bootstrap; optimal for deployments prioritizing operational cost.
Kademlia: Favors aggressive lookup regimes for lowest tail latency; suitable where quick response to queries is valued despite higher traffic.
Chord: Serves as a cost-latency intermediate; potentially advantageous in scenarios blending requirements.

Immediate queries during bootstrap are discouraged in large-scale deployments due to severe cold-start interference. Ensuring a modest stabilization window prior to full admission yields robust correctness and manageable operational cost, underscoring the importance of controlled admission strategies.

Architecturally, the comparative framing is critical. Once an overlay family is selected as the reference substrate, subsequent ecosystem optimizations are constrained by its foundational routing, replication, and state-management logic. The family-centric evaluation thus guards against premature architectural lock-in and supports reasoned exploration of future discovery control-plane designs.

Implications and Future Directions

The findings directly inform agentic AI infrastructure design:

Agent directories in multi-tier, heterogeneous environments should incorporate warmup strategies to eliminate startup penalties.
Overlay family choice constrains not only the functional semantics but shapes the ongoing optimization and maintenance envelope.
Benchmarks reveal practical bounds, not theoretical maxima; future work exploring broader workload diversity, richer network failure models, and protocol variants could shift optimal operating points but are unlikely to erase the identified trade-off structure.

Long-term, these comparative baselines serve as reference points for ADS-centric studies. Even as application-side semantics diversify, the ability to differentiate startup effects, lookup cost, and control-plane traffic remains vital.

Conclusion

This paper rigorously characterizes the operational trade-offs between Chord, Pastry, and Kademlia overlays as decentralized substrates for agentic AI discovery across the compute continuum (2605.11839). Pastry emerges as the lowest-cost post-warmup solution, Kademlia offers best tail latency at higher communication cost, and Chord stands as an intermediate. Cold-start effects are non-trivial at scale but are mitigated with a log2 N warmup window. Correctness is preserved under churn, with deployment choice primarily governed by desired cost-latency profiles. The comparative approach equips agentic infrastructure designers with actionable guidance, supporting architectural governance and informed substrate selection as the agent ecosystem evolves.

Markdown Report Issue