SDN-Inspired Agentic Serving

Updated 13 January 2026

The paper introduces SDN-inspired agentic serving by mapping SDN's control/data-plane separation to dynamic orchestration of multi-stage AI workflows.
It details a framework where dedicated resource pools, per-stage isolation, and intent-driven policies optimize throughput, latency, and SLA satisfaction.
Empirical results show significant throughput gains, reduced tail latency, and efficient resource usage compared to traditional static serving architectures.

SDN-inspired agentic serving is a paradigm that transposes core software-defined networking (SDN) concepts—especially the separation of control and data planes, dynamic intent-driven orchestration, and real-time resource steering—into the domain of multi-stage, agentic AI workflow serving. This approach is motivated by the growing complexity and scale of LLM agent systems, where traditional monolithic or static serving architectures fail to meet requirements for scalability, efficiency, and SLA (SLO) satisfaction. SDN-inspired serving frameworks introduce programmable, intent-driven controllers that reason globally over agent workflows, decouple logical task graphs from physical execution, and adaptively orchestrate resources, routing, batching, and caching to maximize throughput, minimize latency, and efficiently utilize heterogeneous compute substrates.

1. Mapping SDN Principles to Agentic Serving

SDN-inspired agentic serving is grounded in a set of correspondences between classical SDN constructs and agentic workflow serving mechanisms:

Control/Data-Plane Decoupling: The controller (SDN control plane) is mirrored by a logically centralized orchestrator in agentic serving platforms, which maintains a global view of workflow DAGs, tracks system telemetry, computes scheduling and resource allocation policies, and issues decisions to distributed data-plane entities. The data plane comprises pools of homogeneous workers (e.g., GPU LLM engines, tool executors) that execute requests, subject to control-plane policies (Pagonas et al., 15 Oct 2025, &&&1&&&, Luo et al., 19 Feb 2025, Laju et al., 8 Jan 2026, Gim et al., 28 Oct 2025, Agarwal et al., 6 Jan 2026, Dai et al., 26 Nov 2025).
Flow Isolation: In SDN, flows or slices isolate traffic to prevent interference. In agentic serving, this is achieved via stage-wise separation—each workflow stage or agent type is provisioned with a dedicated resource pool and its own queue/cache, ensuring predictable performance and eliminating cross-stage cache thrashing (Pagonas et al., 15 Oct 2025).
Dynamic Resource Allocation: Similar to SDN’s programmable bandwidth steering and elastic rate limiting, agentic serving orchestrators dynamically resize resource pools, reassign idle engines, and adapt model or hardware allocations based on load and stage requirements (Pagonas et al., 15 Oct 2025, Chaudhry et al., 22 Aug 2025, Dai et al., 26 Nov 2025).
Programmability and Policy Expression: By exposing clear APIs (e.g., set/reset parameters, configuration rules, resource hints), SDN-inspired agentic serving makes communication and execution policies first-class, enabling high-level, intent-driven specification and runtime adaptation (Agarwal et al., 6 Jan 2026).

2. Core Architectures and System Components

SDN-inspired agentic serving platforms exhibit diverse system architectures, each reflecting the core SDN metaphors:

System	Control Plane	Data Plane	Intermediate Planes/Features
Cortex (Pagonas et al., 15 Oct 2025)	Orchestrator (scheduling, scaling)	Stage Engine Pools (resource-isolated workers)	Engine-Allocation Layer, local KV caches
Pie (Gim et al., 28 Oct 2025)	User “inferlets” controlling logic	Wasm service-handlers, GPU execution	Control layer with batch scheduler
Nalar (Laju et al., 8 Jan 2026)	Global and local controllers	Python stubs emitting futures to data-plane agents	Managed state; dependency graph engine
SDAS (Agarwal et al., 6 Jan 2026)	Policy/intent controller	Data-plane shims and agent runtimes	Metrics plane
Murakkab (Chaudhry et al., 22 Aug 2025)	Declarative workflow DAG, MILP optimizer	Task executors with physical model/hardware assignment	Adaptive runtime, registry
Aragog (Dai et al., 26 Nov 2025)	Just-in-time accuracy routing	Per-stage model assignment, beam search	Router heads, real-time resource telemetry

Architectural flows universally feature (i) a clear API or signaling path between control and data planes for dynamic reconfiguration, (ii) resource and policy isolation at per-stage/agent granularity, and (iii) heap or cache management reflecting SDN-style per-flow or per-switch state.

3. Mathematical and Algorithmic Foundations

SDN-inspired agentic serving imposes formal models for scheduling, routing, and resource allocation, often instantiated as mixed-integer programs or queueing-theoretic formulations:

Workflow Stage Model: For $K$ stages, with $c_s$ engines at stage $s$ , service rate $\mu_s$ , arrival rate $\lambda_1$ , and selectivity factors $\sigma_s$ , the load at each stage is $\lambda_s = \lambda_1 \cdot \prod_{j < s} \sigma_j$ . Throughput and stability require $\lambda_s \leq c_s \mu_s$ for all $s$ (Pagonas et al., 15 Oct 2025).
Latency-SLO Optimization: End-to-end P99 latency aggregates per-stage queueing bounds, e.g., $\sum_{s=1}^K T_s^{99}(\rho_s, c_s) \leq L_{\max}$ , with each stage likely approximated by M/M/c Erlang-C bounds (Pagonas et al., 15 Oct 2025). Agentic serving frameworks optimize for maximal $\lambda_1$ subject to stability, latency, and resource constraints.
KV Cache Hit Modeling: Cache hit probability in each pool follows $h_s(c_s) = 1 - e^{-\kappa_s c_s}$ , directly relating pool sizing to cache locality (Pagonas et al., 15 Oct 2025).
Configuration Selection and Per-Stage Scheduling: Systems such as Aragog decouple input-accuracy constraints (static) from dynamic, real-time per-stage model assignment (cost/latency), exploiting monotonicity and beam search for combinatorial efficiency (Dai et al., 26 Nov 2025).
MILP for Workflow Placement: Profile-guided assignment of models, hardware, and runtime knobs subject to SLOs, energy, and cost (e.g., in Murakkab), enables optimal resource deployment and adaptive reoptimization (Chaudhry et al., 22 Aug 2025).

4. Agent-Native Extensions and SDN Parallels

Several platforms further extend the SDN analogy to enable advanced runtime and workflow mechanisms:

Malleable Resource Management: Analogous to SDN rate limiters, control planes can swap in lighter LLM variants or dynamically adjust retry/pruning depths to handle system load and stragglers (Pagonas et al., 15 Oct 2025).
Speculative Execution and Parallel Path Probing: Similar to fast reroute, speculative execution precomputes likely future branches or tool outputs, enabling cache warming and latency reduction upon commit (Pagonas et al., 15 Oct 2025).
Multi-Tiered Agentic State Caching: By mirroring SDN’s hierarchical cache/state stores, Stage-local caches are complemented by global multi-tenant workflow caches (e.g., Redis, in-GPU NVLink), enabling cross-agent artifact reuse and state sharing (Pagonas et al., 15 Oct 2025).
Semantic Control and Cross-Layer Reasoning: In space networks, insertion of agentic-semantic layers enables context-rich reasoning across SDN’s real-time, near-RT, and non-RT control loops, including delay-adaptive planning and semantic compression to maximize wireless bandwidth efficiency (Baena et al., 12 Jun 2025).

5. Comparative Evaluation and Empirical Results

Quantitative evaluations consistently document large gains in throughput, tail latency, and resource utilization:

System	Throughput Gain	Tail Latency Reduction	Notable Resource Gains
Cortex (Pagonas et al., 15 Oct 2025)	+71% over monolithic	P99 900ms vs. 1300ms (-31%)	40% fewer GPU-hours, lower KV-memory footprint
Pie (Gim et al., 28 Oct 2025)	1.3–3.4× over vLLM/SG	Up to 15% lower agentic latency	3–12% per-token overhead on text, higher util.
Murakkab (Chaudhry et al., 22 Aug 2025)	2.8× fewer GPUs	SLO met across all workloads	3.7× less energy, 4.3× less cost
Nalar (Laju et al., 8 Jan 2026)	Up to 2.9× speedup	34–74% lower P99 latencies	Sustains 80 RPS at <50s avg when others collapse
Aragog (Dai et al., 26 Nov 2025)	50–217% throughput	32–78% median latency	<1% loss in accuracy, negligible routing OH
Autellix (Luo et al., 19 Feb 2025)	4–15× > vLLM	P99 consistently lower	Head-of-line blocking eliminated at program level

The introduction of rigorously partitioned stage pools, program-aware scheduling, and just-in-time per-stage configuration assignment explains much of these empirical improvements.

6. Generalization, Trade-offs, and Open Challenges

Generalization: SDN-inspired agentic serving scales to multi-tenant, federated, and edge/cloud hybrid deployments through recursive application of the control/data-plane split. The same design patterns are applicable to structured multimodal workflows, tool use pipelines, and agentic semantic control in wireless/space networks (Baena et al., 12 Jun 2025, Pagonas et al., 15 Oct 2025).
Trade-offs: Over-isolation can lead to resource fragmentation, particularly for short/bursty stages; centralized orchestration introduces complexity and potential bottlenecks (Pagonas et al., 15 Oct 2025). Excessive programmability may complicate validation and cross-workflow optimization.
Open Research Directions: These include formalizing interfaces for resource malleability, optimizing speculative execution under branch uncertainty, designing hierarchical and geo-distributed orchestration, and developing intent-based APIs that expose the full range of SDN-inspired control to application-level logic (Pagonas et al., 15 Oct 2025, Dai et al., 26 Nov 2025, Chaudhry et al., 22 Aug 2025).

7. Future Outlook

The SDN-inspired blueprint provides a unifying systems framework for scalable, policy-driven agentic serving. By treating workflow stages as network flows, resource and scheduling policies as forwarding rules, and agent state as distributed cache or semantic context, SDN-inspired platforms unlock fully programmable, high-throughput, and responsiveness-optimized infrastructure for complex agentic AI applications. Ongoing and future research will further clarify optimal trade-offs between isolation and sharing, formalize policy interfaces, and extend the architecture to support highly dynamic, distributed, and intent-driven serving across future compute fabrics (Pagonas et al., 15 Oct 2025, Gim et al., 28 Oct 2025, Dai et al., 26 Nov 2025, Baena et al., 12 Jun 2025).