Adaptive Minimal Routing in 2D Torus Networks

Updated 22 December 2025

The paper introduces an RL-based adaptive minimal routing strategy that exploits wrap-around connectivity to ensure minimal paths even under stochastic faults.
The paper details a decentralized Proximal Policy Optimization (PPO) implementation that significantly boosts throughput and Packet Delivery Ratio compared to classical odd-even approaches.
The paper demonstrates the practical significance of integrating RL agents in 2D torus NoCs, achieving up to 67% gain in fault resilience and improved fault adaptive scores.

Adaptive minimal routing in 2D torus networks is a network-on-chip (NoC) methodology that ensures packets traverse minimal-length paths, subject to the unique wrap-around connectivity of the torus topology. In the presence of node faults, adaptive minimal routing aims to maintain high throughput and reliability by dynamically exploiting available minimal paths. Recent research has contrasted a reinforcement learning (RL)-driven strategy with a classical adaptive baseline in this domain, demonstrating significant fault resilience and superior performance for the RL method in large-scale torus NoCs (Charrwi et al., 15 Dec 2025).

1. 2D Torus Network Model and Minimal Routing Constraints

The 2D torus NoC consists of an $M \times N$ grid graph $G=(V,E)$ , where $V=\{(x,y)\mid 0\leq x<M, 0\leq y<N\}$ and each node $(x,y)$ connects to its four undirected neighbors:

$(x\oplus1 \bmod M, y)$ , $(x\ominus1 \bmod M, y)$ in the X-dimension
$(x, y\oplus1 \bmod N)$ , $(x, y\ominus1 \bmod N)$ in the Y-dimension Wrap-around ( $\oplus, \ominus$ ) ensures all edge nodes are cyclically connected, minimizing network diameter and enhancing path diversity.

Minimal routing requirement: For source $s=(x_1, y_1)$ and destination $d=(x_2, y_2)$ , minimal coordinate-wise distances are:

$dx = \min\{(x_2-x_1)\bmod M, (x_1-x_2)\bmod M\},\quad dy = \min\{(y_2-y_1)\bmod N, (y_1-y_2)\bmod N\}$

Total minimal distance $D(s,d)=dx+dy$ . A minimal route must always reduce $D$ at each hop: at node $c_t$ , any output port $p$ must satisfy $D(c_t,d) = 1 + D(c_t \rightarrow p, d)$ , ensuring deadlock-avoidance and path optimality.

2. RL-Driven Adaptive Minimal Routing Approach

Each router is modeled as a decentralized RL agent employing Proximal Policy Optimization (PPO), with local state, action, and reward formulations:

State $s_t$ : Comprised of

Buffer occupancies $b_i \in \{0,...,B_\text{max}\}$ for ports $i\in\{\text{N,S,E,W}\}$
Fault flags $f_i\in\{0,1\}$ , indicating neighbor faults
Relative destination coordinates $\Delta x, \Delta y$ spanning $[-\lfloor M/2\rfloor,\lfloor M/2\rfloor]$ and $[-\lfloor N/2\rfloor,\lfloor N/2\rfloor]$

Action space $A(s_t)$ : Permitted directions $p$ (out of North, South, East, West) must satisfy both:

Minimality ( $D(r \rightarrow p, d) = D(r, d) - 1$ )
Non-faulty neighbor ( $f_p=0$ )

Reward function $r_t$ :

Delivery: $r_t = 100 + 50f$
Hop: $r_t = -1$
Faulty neighbor access: $r_t = -50$ (terminal)
Dead-end: $r_t = -20$ (terminal)

PPO Implementation: Policy $\pi_\theta(a|s)$ and value $V_\phi(s)$ by shallow MLPs (single hidden layer, 32 units). Objective clipping uses $\epsilon=0.2$ , with $\gamma=0.99$ , $\lambda=0.95$ , learning rate $\alpha=3\times10^{-4}$ , batch updates after 64 episodes, and total training over 5000 episodes.

Per-router pseudocode:

initialize θ, φ randomly
loop over episodes:
  inject random fault set F of density f
  sample source s and destination d (both non-faulty)
  current c ← s
  clear trajectory buffer T
  repeat:
    observe sₜ ← [bᵢ,fᵢ,Δx,Δy] at router c
    A(sₜ) ← {ports p | move is minimal AND p not in F}
    if A(sₜ)==∅:
      assign rₜ=–20; store (sₜ,_,rₜ,terminal) in T; break
    choose aₜ ~ π_θ(·|sₜ)
    execute move c→p=aₜ
    if p==d: rₜ=100+50f; store (sₜ,aₜ,rₜ,terminal); break
    if p in F: rₜ=–50; store (sₜ,aₜ,rₜ,terminal); break
    else: rₜ=–1; observe s_{t+1}; store (sₜ,aₜ,rₜ,s_{t+1})
  until terminal
  if collected K episodes:
    perform PPO update on θ,φ using buffer of trajectories
end loop

3. Baseline Adaptive Routing: Odd–Even Minimal Scheme

The baseline employs an Odd–Even turn model tailored for torus:

Phase 1: X-dimension routing to minimize east/west hops with wrap-around, subject to turn restrictions
Phase 2: Y-dimension routing in similar fashion
Packets are dropped if minimal directions in a dimension are both fault-blocked No congestion awareness or dynamic detours outside minimal XY routing are permitted.

4. Evaluation Metrics

Three principal metrics are used to compare routing algorithms under stochastic faults and load:

Metric	Definition	Range/Comment
Throughput $\Psi(\rho,f)$	(Delivered packets over $T$ cycles) / $(N \cdot M \cdot T)$	$[0,1]$ normalized; offered load $\rho$ per cycle
Packet Delivery Ratio (PDR)	$=$ DeliveredPackets / InjectedPackets	Fractional delivery success
Fault Adaptive Score (FT)	$=$ Success over all connected (s,d) pairs in residual graph	FT $(0)=1$ ; drops as faults increase

These metrics capture not only aggregate packet delivery and network utilization, but also fault-aware resilience across all feasible routes.

5. Simulation and Experimental Setup

Experiments are conducted on an $8 \times 8$ torus (64 routers) using wormhole switching and 4-flit input buffers per port. Traffic is constant-bit-rate with uniform random destinations, while fault sets are injected uniformly at densities $f \in \{0.0, 0.1, ..., 0.5\}$ . Offered load $\rho$ is varied from 0.1 to 1.0 to enable throughput evaluation in both low and high-load regimes. Baseline router and RL agent are trained/evaluated using identical hardware constraints and input distributions.

6. Performance Results and Analysis

The RL-based adaptive minimal router outperforms the Odd–Even baseline across diverse fault and traffic conditions:

Throughput ( $f=0.2$ ):

Baseline saturates at $\sim 0.56$ –$0.59$ for $\rho>0.4$
RL-PPO achieves near-unity ( $\Psi \approx 0.98$ –$1.00$) across all loads
$\sim 30$ – $40\%$ gain under heavy load

PDR at increasing fault density:

Fault $f$	PDR_OE (Odd–Even)	PDR_RL (RL agent)	Relative Gain
0.2	0.58	0.64	+10 pp
0.3	0.42	0.52	+24%
0.4	0.30	0.38	+27%
0.5	0.15	0.25	+67%

Fault Adaptive Score (FT):

RL agent sustains $FT>0.5$ up to $f=0.3$ ; baseline collapses above $f>0.1$ .

Fault $f$	FT_OE	FT_RL
0.1	0.61	0.80
0.2	0.56	0.60
0.3	0.20	0.50
0.4	0.10	0.38
0.5	0.03	0.25

Key factors underlying results:

RL agent exploits path diversity, balancing wrap-around and non-trivial detours, while maintaining minimality
Fault-aware reward scaling promotes robust avoidance of isolated clusters
Strict minimal-hop eligibility in action space avoids deadlocks but enables RL-discovered alternative minimal routes, not encodable via static XY heuristics
PPO stability enables convergence in $\sim$ 500 episodes with consistent fault/load performance

This suggests that RL-based decentralized routing can leverage network state and topology awareness to maximize packet delivery and throughput, even as static turn-based algorithms sharply degrade under rising fault densities.

7. Context and Significance

Charrwi & Hussain's findings systematically demonstrate the advantages of RL-driven adaptive minimal routing in 2D torus NoCs, notably under practical scenarios of dynamic faults and varying traffic. The RL router’s decentralized operation, guided by locally observable state and minimal path eligibility, enables the network to self-adapt to stochastic conditions with a throughput gain of $20$– $30\%$ and up to $60\%$ higher Packet Delivery Ratio and Fault Adaptive Score relative to traditional schemes.

A plausible implication is that the inclusion of RL agents at each router allows global network metrics (connectivity, path diversity, congestion) to be implicitly factored into local routing actions via episodic reward shaping, bridging the gap between static minimal routing and dynamic fault resilience.

These findings are directly relevant for next-generation NoC architectures requiring scalable, fault-tolerant, and high-throughput communication substrates (Charrwi et al., 15 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Toward Self-Healing Networks-on-Chip: RL-Driven Routing in 2D Torus Architectures (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptive Minimal Routing in 2D Torus Networks.