AI-Native Traffic Engineering

Updated 17 November 2025

AI-Native Traffic Engineering is a paradigm that integrates machine learning, reinforcement learning, and graph neural networks to learn routing decisions directly from network data.
It employs end-to-end differentiable methods, iterative algorithm learning, and multi-agent RL to adapt across diverse network topologies and dynamic traffic patterns in real time.
Recent approaches demonstrate significant improvements in scalability, reaction speed, and solution quality compared to classical optimization techniques, enabling practical, large-scale deployment.

AI-Native Traffic Engineering (TE) refers to a paradigm where modern machine learning, especially deep learning, graph neural networks, reinforcement learning, and differentiable programming, are used to replace or augment classical optimization-based TE methodologies. Central to AI-native TE is the principle of learning the entire control process—algorithms, routing decisions, and adaptation policies—directly from network data, traffic matrices, or operational feedback, creating systems able to generalize across diverse network topologies, dynamically respond to shifting traffic patterns, and operate at sub-second timescales. Recent work demonstrates substantial improvements in scalability, reaction time, and solution quality over traditional solvers, with models frequently trained on small synthetic instances and transferred zero-shot to large, real topologies.

1. Formulations and Objectives in AI-Native Traffic Engineering

AI-native TE solutions inherit the mathematical rigor of classical network optimization but integrate these objectives into trainable, differentiable architectures. The canonical TE formulation on a directed graph $\mathcal{G} = (\mathcal{V}, \mathcal{E})$ with edge capacities $c_k$ , demands $d_{u,v}$ , and path incidence vectors $p_{u,v}(w)$ is:

$\min_w \max_k \rho_k(w), \qquad \rho(w) = (d^\top P(w)) \odot (1/c),$

where $P(w)$ encodes integer-valued routing paths given link weights $w$ . Variants optimize either maximum link utilization (MinMaxLoad), total throughput, or multi-objective cost-delay-fairness tradeoffs. AI-native TE research integrates these goals into:

Fully differentiable surrogates using graph neural networks (GNNs), e.g. Routing-by-Backprop (RBB) (Rusek et al., 2022), enabling optimization via gradient descent.
Edge-level dual variable formulation (Geminet (Liu et al., 30 Jun 2025)) supporting scalable iterative updates decoupled from explicit graph representation.
Learning of optimization algorithms themselves (TELGEN (Zhou et al., 31 Mar 2025)): a GNN is aligned with interior-point methods (IPM), allowing fast, topology-agnostic inference.
Multi-agent RL approaches (Teal (Xu et al., 2022), CMRL (Guo et al., 2023)) optimizing decentralized policies per node or demand under global objectives.

2. Machine Learning Architectures and Methodologies

AI-native TE frameworks employ a diversified set of learning methods which fall into several categories:

A. End-to-End Differentiable Routing via GNNs

RBB (Rusek et al., 2022) parameterizes shortest-path routing as a GNN, producing $\hat{P}(w)$ , a soft $[0,1]$ path incidence matrix, enabling smooth gradients for fast link-weight optimization.
“Encode–process–decode” GNNs with repeated message-passing capture edge and node attributes in the context of network-wide demands.

B. Iterative Algorithm Learning

Geminet (Liu et al., 30 Jun 2025) eschews path-level output for an iterative module wherein small MLPs update per-edge dual variables ( $\lambda_e$ ), given edge-level statistics (utilization, price, capacity, global MLU). All topology-specific information is handled via sparse multiplications with incidence matrices, rendering inference robust to topology changes.

C. Learning Optimization Trajectories

TELGEN (Zhou et al., 31 Mar 2025) transforms the TE LP into a graph problem where GNN layers are tightly algorithmically aligned with the interior-point barrier method. Each outer GNN loop simulates IPM barrier steps, with strongly supervised training on solver iterates.

D. Reinforcement Learning (RL) and Multi-Agent RL

CMRL (Guo et al., 2023) divides TE into subproblems solved by local agents, trained via centralized critics with difference reward credit assignment, supporting fully decentralized online execution.
Teal (Xu et al., 2022) and Hecate (Al-Najjar et al., 8 Jan 2025) employ MARL where each traffic demand or switch is assigned a local agent, learning decentralized allocation, with global counterfactual rewards assessing joint action performance.
CFR-RL (Zhang et al., 2020) learns selection strategies for critical flows to reroute, balancing load with minimal traffic disturbance via RL-guided search and LP rerouting.

E. Data-Enabled Predictive Control

DeeP-TE (Yin et al., 19 Aug 2025) relies purely on historical routing-load pairs using DeePC; predictions and control updates are generated directly via data-driven MPC, bypassing explicit traffic matrix estimation and classical system identification.

3. Scalability, Generalization, and Topology Agnosticism

A defining hallmark of AI-native TE is explicit scalability across orders of magnitude in network size and heterogeneity, achievable by:

Decoupling model architecture from network topology: Geminet's iterative dual update is independent of node and edge count; only the incidence matrix is exchanged (Liu et al., 30 Jun 2025).
Strong generalization: TELGEN generalizes from training on graphs with up to $100$–$800$ nodes to test networks with $5{,}000$ nodes and $100{,}000$ links, achieving $<3\%$ optimality gap (Zhou et al., 31 Mar 2025).
Parameter and memory savings: Geminet demonstrates $0.04$%–$7$% of the NN parameter count of HARP, using $<10$ GiB vs. $>80$ GiB for state-of-the-art path-centric approaches (Liu et al., 30 Jun 2025).
Transfer learning synergy: RBB’s GNN trained on synthetic random graphs maintains $>$ 98.9% binary path prediction accuracy on real WAN topologies (Rusek et al., 2022).
Fast adaptation: CMRL and Teal report sub-second inference times, matching or surpassing offline LP/ILP solvers in speed and scaling to thousands of nodes (Xu et al., 2022, Guo et al., 2023).

4. Practical Deployment, Performance, and Control-Plane Integration

AI-native TE methods are increasingly validated in emulated or real operational settings:

Frameworks such as Hecate+PolKA interface directly with SDN controllers and P4-programmable switches, using a classifier for base configurations and deep RL for adaptation. The PolKA data plane applies polynomial source-routing, offloading path state and enabling sub-second reaction (Al-Najjar et al., 8 Jan 2025).
Real-time telemetry ingestion and inference loops are critical; e.g., Hecate receives per-link statistics, invokes optimization, and installs new path rules within $<$ 200 ms (Al-Najjar et al., 8 Jan 2025).
RL-driven TE approaches (CMRL, CFR-RL) operate in highly dynamic environments—handling traffic surges and link failures—with up to $41\%$ reduction in max link utilization over OSPF and sub-millisecond inference (Guo et al., 2023, Zhang et al., 2020).
ADMM-based post-processing in Teal efficiently restores solution feasibility after ML inference, with 2–5 rounds yielding optimality close to LP, all on GPU within $<$ 2 seconds for $2\text{k}$ -node networks (Xu et al., 2022).
DeeP-TE achieves near-optimal delay performance and $10\times$ lower routing churn than baseline estimation-based reconfigurers, reflecting graceful adaptation and reliability in practice (Yin et al., 19 Aug 2025).

5. Comparison with Classical Methods and Empirical Performance

Extensive empirical results reveal key performance dimensions:

Approach	Memory (GiB)	Time-to-Optimal (s)	Routing Quality
Geminet	0.02–0.99	59–249	1.01–1.12 normalized MLU
HARP	0.53–10.62	214–1357	1.02–1.14 normalized MLU

TELGEN achieves $<3\%$ optimality gap on $5\text{k}$ -node graphs and reduces training/prediction time by $2$–$4$ orders of magnitude over prior schemes (Zhou et al., 31 Mar 2025).
Teal matches or exceeds optimal demand satisfaction while outperforming commercial solvers by $197$– $625\times$ in runtime on large WANs (Xu et al., 2022).
CMRL achieves $33$%–$41$% max-utilization reduction vs OSPF, robust adaptation under failures, and $<$ 0.53 ms inference (Guo et al., 2023).
DeeP-TE outperforms tomogravity-based adaptive routing by $20$% in delay and $10\times$ in routing stability (Yin et al., 19 Aug 2025).
CFR-RL reduces disturbance ( $<$ 21\% $of traffic rerouted) while retaining$ >95 $\% of optimality obtainable with global rerouting (<a href="/papers/2004.11986" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Zhang et al., 2020</a>).</li> </ul> <h2 class='paper-heading' id='emerging-paradigms-agentic-ai-llm-driven-and-multi-objective-optimization'>6. Emerging Paradigms: Agentic AI, LLM-Driven and Multi-Objective Optimization</h2> <p>Recent thrusts describe autonomous, agentic TE, where LLMs embed high-level reasoning and multi-objective optimization as runtime services (FlexNGIA 2.0 (<a href="/papers/2509.02124" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Zhani et al., 2 Sep 2025</a>)):</p> <ul> <li>TE is orchestrated by a Resource-Allocation Agent, consuming fine-grained telemetry and employing <a href="https://www.emergentmind.com/topics/chain-of-thought-cot-6c32f039-72dd-425a-88b0-8f648cabbe4e" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">chain-of-thought</a> prompts to dynamically choose weights for cost, profit, utilization, fairness, and energy penalty objectives.</li> <li>Multi-agent communication (IFA, RA agent, SFC agent) coordinates TE decisions and pushes policies to SDN controllers in closed feedback loops.</li> <li>Preliminary experiments confirm$ 20 $–$ 25\% $improvements in green-energy usage,$ 7.9\%$ higher utilization fairness, and rapid adaptation to workload surges.
The system allows for prompt-format evolution, exploratory reasoning (Tree-of-Thoughts), and potential integration of solver distillation or GNN-based policies for real-time operation at scale.

7. Limitations, Open Challenges, and Directions for Research

Several limitations and open topics are highlighted:

Surrogate–objective mismatch: continuous relaxation losses (e.g., soft-max) may not perfectly reflect discrete TE objectives, requiring careful tuning of temperatures or learning rates (Rusek et al., 2022).
Training overhead: offline training of GNNs or RL agents may take hours, though inference is fast post-deployment (Rusek et al., 2022, Xu et al., 2022).
Deployment safety: AI-native TE must validate proposals for loop-free, in-band feasibility and provide fail-safe reversion to classical protocols if performance worsens (Rusek et al., 2022).
Inter-agent and cross-layer coordination: ensuring TE decisions do not conflict with transport or service-layer goals in agentic architectures (Zhani et al., 2 Sep 2025).
Model robustness to topology and traffic drift: Geminet, TELGEN, and DeeP-TE focus on agnostic models and data-driven predictive control but acknowledge the need for further generalization to extreme scenarios or hardware constraints (Liu et al., 30 Jun 2025, Yin et al., 19 Aug 2025).
Integration of richer objectives: extending learned policies to incorporate delay, jitter, energy, risk, and multi-domain coordination.

A plausible implication is that future AI-native TE systems may synthesize classical optimization expertise, neural algorithm distillation, real-time learning, and agentic reasoning into unified platforms capable of operating safely and efficiently at Internet scale. Continuing research is required on LLM reliability benchmarks, architectural distillation, and robust on-the-fly generalization.