- The paper introduces PolicyCache-SDN, which applies hierarchical intra-path learning via a two-plane controller-agent architecture for adaptive SDN traffic control.
- It achieves significant improvements, including 84% average link utilization, up to 40% reduction in flow completion times, and a 62.6% decrease in SLA violations.
- The study validates robust performance, low operational overhead, and scalability while ensuring safe, auditable network operations through policy envelope enforcement.
Hierarchical Intra-Path Learning for Adaptive SDN Traffic Control with PolicyCache-SDN
Architecture and Control Abstraction
PolicyCache-SDN formalizes hierarchical intra-path learning within SDN traffic control, introducing a two-plane architecture comprising a global SDN controller and distributed edge agents. The controller operates at a slower cadence—hundreds of milliseconds or longer—maintaining topology, aggregating telemetry, generating policy envelopes, and arbitrating conflicting agent actions. Edge agents, placed at ToR switches or vSwitches, execute local online learning for their assigned path aggregates, adapting and committing actions such as metering, queue adjustment, and rerouting at fine granularity (typically 50 ms intervals). Policy envelopes codify controller intent, setting rate bounds, reroute permissions, utility weights, and ensuring safe, auditable, and reversible actions at edge agents. This split design achieves locality in exploration while guaranteeing compliance with global policies and resource isolation.
Figure 1: PolicyCache-SDN two-plane architecture, central SDN controller distributes envelopes to edge agents for rapid intra-path learning.
Controller-Agent Interface and Action Cycle
The controller-agent message flow is asynchronous and lightweight, with envelopes pushed on version change and telemetry, action logs returned by agents. The controller compiles envelopes using weighted water-filling over bottlenecks, tenant policies, service templates, and measured demand. Each envelope is a contract: edge agents must clip their actions (rate, reroute, queue) to bounds specified in the envelope and disable rerouting if envelopes become stale. The agent interval loop alternates between backup exploration—probing actions in envelope and empirically selecting local optima—and model execution based on a Hoeffding Adaptive Tree (HAT) trained online with canary probes and fallback mechanisms for concept drift detection (ADWIN).
Figure 2: Controller–agent interface: envelopes, bottleneck alerts, audit logs, and OVS actions flow asynchronously between controller and edge agent.
The intra-path learning scope is path aggregate, not individual flow, with state vector capturing utilization, queue, loss, ECN marks, throughput, delay, and interval changes. Action space covers discrete metering, queue priority adjustment, and reroute triggers for elephants. Utility function weights are controller-distributed and adjusted per service class, shifting local agent objectives as SLA status or spare capacity changes. Empirical labels are continuously refreshed through envelope-restricted probes, ensuring that learning and execution remain strictly bounded. Controller arbitration serializes conflicting agent actions at shared bottlenecks; envelope compliance is formally guaranteed, but global SLA invariants require timely synchronization and drift detection.
Empirical Evaluation and Numerical Results
Extensive evaluation on a 1,024-host AWS Clos-fabric software SDN testbed demonstrates significant improvements across multiple axes:
Figure 4: Elephant FCT CDF is left-shifted by 33% over Static ECMP; mice flows also improve.
Figure 5: Mean and P99 FCT summary, highlighting PolicyCache-SDN's superior central and tail performance over all software baselines.
- Tail Latency and SLA Violations: PolicyCache-SDN reduces P99 delay for real-time flows by 37.7% relative to Static Meter and 11.7% over Centralized TE. SLA violation rate drops from 18.2% to 6.8% (62.6% reduction), outperforming all baselines. Notably, Aurora-SDN’s violation rate doubles under out-of-distribution traffic.
Figure 6: PolicyCache-SDN rapidly stabilizes P99 delay after traffic matrix shift, outperforming offline RL and other baselines.
Figure 7: PolicyCache-SDN achieves the lowest tail latency and SLA violation rate under mixed real-time workloads.
Convergence, Scalability, Overhead, and Coordination
All agents reach execution mode within 400 ms from cold start, post-drift convergence remains below 350 ms at P95. CPU and memory remain minimal (<2.1%, <13.4 MB per agent). Bottleneck oscillations are suppressed with controller arbitration: reroute-flip events drop from 4.3/s to 0.4/s and utilization variance narrows from ±22 pp to ±5.1 pp. PolicyCache-SDN scales to 64 agents, sustaining 84% utilization; baselines plateau.

Figure 8: (a) Agent convergence time distribution; (b) Average link utilization increases with agent count, plateauing at 84% for PolicyCache-SDN.
Figure 9: Convergence and per-agent overhead remain low even as scale and stress increase.
Component Ablations and Sensitivity Analysis
Ablation studies confirm the incremental benefit of each action type: disabling rerouting drops average utilization by 7.7 pp, queue priority by 3.9 pp, metering by 1.9 pp. Interval and rate step sizes directly trade adaptation speed for SLA compliance. Parameter sensitivity is low; convergence is robust under traffic shifts and envelope staleness.
Figure 10: Rerouting contributes most to utilization, queue and metering provide additive gains.
Figure 11: Centralized TE update-interval sensitivity: faster loops improve TE but at high CPU cost; split agent-envelope design is consistently superior.
Robustness and Stress Testing
PolicyCache-SDN maintains compliance and performance under envelope staleness, telemetry loss, delayed actions, controller outages, traffic oscillation, and excessive rerouting. During controller outage, agents act within last valid envelope (excluding new reroutes); enforcement is resilient to control plane failures.
Figure 12: Robustness heatmap shows minimal degradation across utilization, tail latency, SLA violations, and disorder under multiple stress scenarios.
Theoretical and Practical Implications
By lifting intra-flow locality to intra-path aggregates, PolicyCache-SDN enables safe, composable online learning in SDN fabrics. The controller–agent abstraction addresses non-trivial coordination and arbitration in presence of reroutes and shared bottlenecks—critical for multi-agent distributed learning. The policy envelope concept generalizes to other SDN actions and multi-tenant fabrics, promoting scalable, auditable control over fast reaction cycles. PolicyCache-SDN outperforms both classical and learned baselines, empirically invalidating the sufficiency of static policies or offline-trained RL in presence of dynamic and unpredictable traffic matrices.
Future Directions
PolicyCache-SDN opens several avenues for further research: hardware-fabric validation using P4 and SmartNIC, integration of adaptive MARL and online-RL baselines, expansion to multi-domain and inter-AS traffic engineering, and rigorous analysis of multi-agent learning with discrete control actions. The envelope abstraction suggests applicability to broader network management and programmable data-plane learning (e.g., in-network classifiers, decentralized RL in federated environments).
Conclusion
PolicyCache-SDN renders SDN traffic control adaptive, auditable, and robust via hierarchical intra-path online learning. Empirical results demonstrate significant, quantifiable gains in utilization, completion times, tail latency, SLA compliance, scalability, and robustness. The policy envelope mechanism and controller–agent abstraction provide a practical path forward for scalable, safe, and flexible traffic engineering in modern datacenter and wide-area SDN deployments (2605.09473).