Adaptive Minimal Routing in 2D Torus Networks
- The paper introduces an RL-based adaptive minimal routing strategy that exploits wrap-around connectivity to ensure minimal paths even under stochastic faults.
- The paper details a decentralized Proximal Policy Optimization (PPO) implementation that significantly boosts throughput and Packet Delivery Ratio compared to classical odd-even approaches.
- The paper demonstrates the practical significance of integrating RL agents in 2D torus NoCs, achieving up to 67% gain in fault resilience and improved fault adaptive scores.
Adaptive minimal routing in 2D torus networks is a network-on-chip (NoC) methodology that ensures packets traverse minimal-length paths, subject to the unique wrap-around connectivity of the torus topology. In the presence of node faults, adaptive minimal routing aims to maintain high throughput and reliability by dynamically exploiting available minimal paths. Recent research has contrasted a reinforcement learning (RL)-driven strategy with a classical adaptive baseline in this domain, demonstrating significant fault resilience and superior performance for the RL method in large-scale torus NoCs (Charrwi et al., 15 Dec 2025).
1. 2D Torus Network Model and Minimal Routing Constraints
The 2D torus NoC consists of an grid graph , where and each node connects to its four undirected neighbors:
- , in the X-dimension
- , in the Y-dimension Wrap-around () ensures all edge nodes are cyclically connected, minimizing network diameter and enhancing path diversity.
Minimal routing requirement: For source and destination , minimal coordinate-wise distances are:
Total minimal distance . A minimal route must always reduce at each hop: at node , any output port must satisfy , ensuring deadlock-avoidance and path optimality.
2. RL-Driven Adaptive Minimal Routing Approach
Each router is modeled as a decentralized RL agent employing Proximal Policy Optimization (PPO), with local state, action, and reward formulations:
State : Comprised of
- Buffer occupancies for ports
- Fault flags , indicating neighbor faults
- Relative destination coordinates spanning and
Action space : Permitted directions (out of North, South, East, West) must satisfy both:
- Minimality ()
- Non-faulty neighbor ()
Reward function :
- Delivery:
- Hop:
- Faulty neighbor access: (terminal)
- Dead-end: (terminal)
PPO Implementation: Policy and value by shallow MLPs (single hidden layer, 32 units). Objective clipping uses , with , , learning rate , batch updates after 64 episodes, and total training over 5000 episodes.
Per-router pseudocode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
initialize θ, φ randomly loop over episodes: inject random fault set F of density f sample source s and destination d (both non-faulty) current c ← s clear trajectory buffer T repeat: observe sₜ ← [bᵢ,fᵢ,Δx,Δy] at router c A(sₜ) ← {ports p | move is minimal AND p not in F} if A(sₜ)==∅: assign rₜ=–20; store (sₜ,_,rₜ,terminal) in T; break choose aₜ ~ π_θ(·|sₜ) execute move c→p=aₜ if p==d: rₜ=100+50f; store (sₜ,aₜ,rₜ,terminal); break if p in F: rₜ=–50; store (sₜ,aₜ,rₜ,terminal); break else: rₜ=–1; observe s_{t+1}; store (sₜ,aₜ,rₜ,s_{t+1}) until terminal if collected K episodes: perform PPO update on θ,φ using buffer of trajectories end loop |
3. Baseline Adaptive Routing: Odd–Even Minimal Scheme
The baseline employs an Odd–Even turn model tailored for torus:
- Phase 1: X-dimension routing to minimize east/west hops with wrap-around, subject to turn restrictions
- Phase 2: Y-dimension routing in similar fashion
- Packets are dropped if minimal directions in a dimension are both fault-blocked No congestion awareness or dynamic detours outside minimal XY routing are permitted.
4. Evaluation Metrics
Three principal metrics are used to compare routing algorithms under stochastic faults and load:
| Metric | Definition | Range/Comment |
|---|---|---|
| Throughput | (Delivered packets over cycles) / | normalized; offered load per cycle |
| Packet Delivery Ratio (PDR) | DeliveredPackets / InjectedPackets | Fractional delivery success |
| Fault Adaptive Score (FT) | Success over all connected (s,d) pairs in residual graph | FT; drops as faults increase |
These metrics capture not only aggregate packet delivery and network utilization, but also fault-aware resilience across all feasible routes.
5. Simulation and Experimental Setup
Experiments are conducted on an torus (64 routers) using wormhole switching and 4-flit input buffers per port. Traffic is constant-bit-rate with uniform random destinations, while fault sets are injected uniformly at densities . Offered load is varied from 0.1 to 1.0 to enable throughput evaluation in both low and high-load regimes. Baseline router and RL agent are trained/evaluated using identical hardware constraints and input distributions.
6. Performance Results and Analysis
The RL-based adaptive minimal router outperforms the Odd–Even baseline across diverse fault and traffic conditions:
Throughput ():
- Baseline saturates at –$0.59$ for
- RL-PPO achieves near-unity (–$1.00$) across all loads
- – gain under heavy load
PDR at increasing fault density:
| Fault | PDR_OE (Odd–Even) | PDR_RL (RL agent) | Relative Gain |
|---|---|---|---|
| 0.2 | 0.58 | 0.64 | +10 pp |
| 0.3 | 0.42 | 0.52 | +24% |
| 0.4 | 0.30 | 0.38 | +27% |
| 0.5 | 0.15 | 0.25 | +67% |
Fault Adaptive Score (FT):
RL agent sustains up to ; baseline collapses above .
| Fault | FT_OE | FT_RL |
|---|---|---|
| 0.1 | 0.61 | 0.80 |
| 0.2 | 0.56 | 0.60 |
| 0.3 | 0.20 | 0.50 |
| 0.4 | 0.10 | 0.38 |
| 0.5 | 0.03 | 0.25 |
Key factors underlying results:
- RL agent exploits path diversity, balancing wrap-around and non-trivial detours, while maintaining minimality
- Fault-aware reward scaling promotes robust avoidance of isolated clusters
- Strict minimal-hop eligibility in action space avoids deadlocks but enables RL-discovered alternative minimal routes, not encodable via static XY heuristics
- PPO stability enables convergence in 500 episodes with consistent fault/load performance
This suggests that RL-based decentralized routing can leverage network state and topology awareness to maximize packet delivery and throughput, even as static turn-based algorithms sharply degrade under rising fault densities.
7. Context and Significance
Charrwi & Hussain's findings systematically demonstrate the advantages of RL-driven adaptive minimal routing in 2D torus NoCs, notably under practical scenarios of dynamic faults and varying traffic. The RL router’s decentralized operation, guided by locally observable state and minimal path eligibility, enables the network to self-adapt to stochastic conditions with a throughput gain of $20$– and up to higher Packet Delivery Ratio and Fault Adaptive Score relative to traditional schemes.
A plausible implication is that the inclusion of RL agents at each router allows global network metrics (connectivity, path diversity, congestion) to be implicitly factored into local routing actions via episodic reward shaping, bridging the gap between static minimal routing and dynamic fault resilience.
These findings are directly relevant for next-generation NoC architectures requiring scalable, fault-tolerant, and high-throughput communication substrates (Charrwi et al., 15 Dec 2025).