PolicyCache-SDN: Hierarchical Intra-Path Learning for Adaptive SDN Traffic Control

Published 10 May 2026 in cs.NI | (2605.09473v1)

Abstract: Software defined networks offer global visibility, yet centralized control loops are too slow for transient congestion and bursty traffic dynamics. Existing learned traffic control schemes often rely on offline training, making them fragile under distribution shifts. We present PolicyCache-SDN, a hierarchical SDN traffic control framework that enables local online adaptation under centralized policy control. Its key abstraction is a policy envelope: the controller compiles network wide intent into bounded per path action spaces, while edge agents learn and execute metering, queueing, and rerouting decisions only within those bounds. Policy envelopes also make local actions auditable and reversible when they affect shared bottlenecks. Evaluation on a 1,024 host software SDN testbed shows that PolicyCache-SDN improves average core link utilization by 35.5% over Static ECMP and 18.3% over Centralized TE. It reduces elephant flow P99 FCT by 34.3% over end host congestion control, lowers SLA violations from 18.2% to 6.8%, and uses less than 2% CPU and 12 MB memory per edge agent. The source code is available in an anonymized repository at https://anonymous.4open.science/r/JCC2026-PolicyCache-SDN/.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces PolicyCache-SDN, which applies hierarchical intra-path learning via a two-plane controller-agent architecture for adaptive SDN traffic control.
It achieves significant improvements, including 84% average link utilization, up to 40% reduction in flow completion times, and a 62.6% decrease in SLA violations.
The study validates robust performance, low operational overhead, and scalability while ensuring safe, auditable network operations through policy envelope enforcement.

Hierarchical Intra-Path Learning for Adaptive SDN Traffic Control with PolicyCache-SDN

Architecture and Control Abstraction

PolicyCache-SDN formalizes hierarchical intra-path learning within SDN traffic control, introducing a two-plane architecture comprising a global SDN controller and distributed edge agents. The controller operates at a slower cadence—hundreds of milliseconds or longer—maintaining topology, aggregating telemetry, generating policy envelopes, and arbitrating conflicting agent actions. Edge agents, placed at ToR switches or vSwitches, execute local online learning for their assigned path aggregates, adapting and committing actions such as metering, queue adjustment, and rerouting at fine granularity (typically 50 ms intervals). Policy envelopes codify controller intent, setting rate bounds, reroute permissions, utility weights, and ensuring safe, auditable, and reversible actions at edge agents. This split design achieves locality in exploration while guaranteeing compliance with global policies and resource isolation.

Figure 1: PolicyCache-SDN two-plane architecture, central SDN controller distributes envelopes to edge agents for rapid intra-path learning.

Controller-Agent Interface and Action Cycle

The controller-agent message flow is asynchronous and lightweight, with envelopes pushed on version change and telemetry, action logs returned by agents. The controller compiles envelopes using weighted water-filling over bottlenecks, tenant policies, service templates, and measured demand. Each envelope is a contract: edge agents must clip their actions (rate, reroute, queue) to bounds specified in the envelope and disable rerouting if envelopes become stale. The agent interval loop alternates between backup exploration—probing actions in envelope and empirically selecting local optima—and model execution based on a Hoeffding Adaptive Tree (HAT) trained online with canary probes and fallback mechanisms for concept drift detection (ADWIN).

Figure 2: Controller–agent interface: envelopes, bottleneck alerts, audit logs, and OVS actions flow asynchronously between controller and edge agent.

Learning Formulation, Safety, and Coordination

The intra-path learning scope is path aggregate, not individual flow, with state vector capturing utilization, queue, loss, ECN marks, throughput, delay, and interval changes. Action space covers discrete metering, queue priority adjustment, and reroute triggers for elephants. Utility function weights are controller-distributed and adjusted per service class, shifting local agent objectives as SLA status or spare capacity changes. Empirical labels are continuously refreshed through envelope-restricted probes, ensuring that learning and execution remain strictly bounded. Controller arbitration serializes conflicting agent actions at shared bottlenecks; envelope compliance is formally guaranteed, but global SLA invariants require timely synchronization and drift detection.

Empirical Evaluation and Numerical Results

Extensive evaluation on a 1,024-host AWS Clos-fabric software SDN testbed demonstrates significant improvements across multiple axes:

Link Utilization: PolicyCache-SDN achieves 84% average core link utilization, yielding 35.5% improvement over Static ECMP and 18.3% over Centralized TE. The worst-link maximum is only 2.4 percentage points above average, indicating tight utilization distribution. End-host PolicyCache and offline-RL baselines such as Aurora-SDN fail to maintain utilization under distributional shift, confirming the necessity for in-network, online adaptation.
Figure 3: PolicyCache-SDN yields highest average utilization with minimal max-average gap versus all baseline schemes.
Flow Completion Time (FCT): Elephant mean FCT is reduced by 33.2% (mean) and 40.3% (P99) relative to Static ECMP, and 18.7%/24.1% over Centralized TE. Mice flows benefit concurrently: PolicyCache-SDN shifts both elephant and mice CDFs leftward, improving both median and tail.

Figure 4: Elephant FCT CDF is left-shifted by 33% over Static ECMP; mice flows also improve.

Figure 5: Mean and P99 FCT summary, highlighting PolicyCache-SDN's superior central and tail performance over all software baselines.

Tail Latency and SLA Violations: PolicyCache-SDN reduces P99 delay for real-time flows by 37.7% relative to Static Meter and 11.7% over Centralized TE. SLA violation rate drops from 18.2% to 6.8% (62.6% reduction), outperforming all baselines. Notably, Aurora-SDN’s violation rate doubles under out-of-distribution traffic.
Figure 6: PolicyCache-SDN rapidly stabilizes P99 delay after traffic matrix shift, outperforming offline RL and other baselines.

Figure 7: PolicyCache-SDN achieves the lowest tail latency and SLA violation rate under mixed real-time workloads.

Convergence, Scalability, Overhead, and Coordination

All agents reach execution mode within 400 ms from cold start, post-drift convergence remains below 350 ms at P95. CPU and memory remain minimal (<2.1%, <13.4 MB per agent). Bottleneck oscillations are suppressed with controller arbitration: reroute-flip events drop from 4.3/s to 0.4/s and utilization variance narrows from ±22 pp to ±5.1 pp. PolicyCache-SDN scales to 64 agents, sustaining 84% utilization; baselines plateau.

Figure 8: (a) Agent convergence time distribution; (b) Average link utilization increases with agent count, plateauing at 84% for PolicyCache-SDN.

Figure 9: Convergence and per-agent overhead remain low even as scale and stress increase.

Component Ablations and Sensitivity Analysis

Ablation studies confirm the incremental benefit of each action type: disabling rerouting drops average utilization by 7.7 pp, queue priority by 3.9 pp, metering by 1.9 pp. Interval and rate step sizes directly trade adaptation speed for SLA compliance. Parameter sensitivity is low; convergence is robust under traffic shifts and envelope staleness.

Figure 10: Rerouting contributes most to utilization, queue and metering provide additive gains.

Figure 11: Centralized TE update-interval sensitivity: faster loops improve TE but at high CPU cost; split agent-envelope design is consistently superior.

Robustness and Stress Testing

PolicyCache-SDN maintains compliance and performance under envelope staleness, telemetry loss, delayed actions, controller outages, traffic oscillation, and excessive rerouting. During controller outage, agents act within last valid envelope (excluding new reroutes); enforcement is resilient to control plane failures.

Figure 12: Robustness heatmap shows minimal degradation across utilization, tail latency, SLA violations, and disorder under multiple stress scenarios.

Theoretical and Practical Implications

By lifting intra-flow locality to intra-path aggregates, PolicyCache-SDN enables safe, composable online learning in SDN fabrics. The controller–agent abstraction addresses non-trivial coordination and arbitration in presence of reroutes and shared bottlenecks—critical for multi-agent distributed learning. The policy envelope concept generalizes to other SDN actions and multi-tenant fabrics, promoting scalable, auditable control over fast reaction cycles. PolicyCache-SDN outperforms both classical and learned baselines, empirically invalidating the sufficiency of static policies or offline-trained RL in presence of dynamic and unpredictable traffic matrices.

Future Directions

PolicyCache-SDN opens several avenues for further research: hardware-fabric validation using P4 and SmartNIC, integration of adaptive MARL and online-RL baselines, expansion to multi-domain and inter-AS traffic engineering, and rigorous analysis of multi-agent learning with discrete control actions. The envelope abstraction suggests applicability to broader network management and programmable data-plane learning (e.g., in-network classifiers, decentralized RL in federated environments).

Conclusion

PolicyCache-SDN renders SDN traffic control adaptive, auditable, and robust via hierarchical intra-path online learning. Empirical results demonstrate significant, quantifiable gains in utilization, completion times, tail latency, SLA compliance, scalability, and robustness. The policy envelope mechanism and controller–agent abstraction provide a practical path forward for scalable, safe, and flexible traffic engineering in modern datacenter and wide-area SDN deployments (2605.09473).

Markdown Report Issue