MaxScore Routing: Optimal Wireless & MoE

Updated 18 April 2026

MaxScore routing is a framework that optimizes relay selection in wireless networks and token assignments in MoE, ensuring near-optimal performance under partial CSI and capacity constraints.
In wireless ad hoc networks, it leverages partial channel state information and Monte Carlo techniques to maximize the Asymptotic Density of Rate-Progress, outperforming classic heuristics by up to 180%.
For sparse Mixture-of-Experts, the method casts token-to-expert assignment as a min-cost maximum-flow problem using a SoftTopk operator to achieve balanced, differentiable routing without token dropping.

Maximum Score Routing (MaxScore) refers to a family of methods for optimal assignment or relay selection developed independently for two distinct research domains: (1) distributed routing in random wireless ad hoc networks under partial channel state information (CSI), and (2) sparse Mixture-of-Experts (MoE) deep learning architectures, where routing tokens to experts is cast as a flow optimization problem. Despite the divergent contexts, both approaches share the defining principle of maximizing a rigorously defined “score” function per candidate, subject to structural constraints, yielding provably optimal or near-optimal results compared to classical heuristics.

1. MaxScore in Wireless Ad Hoc Networks: System Model and Score Definition

In the context of wireless ad hoc networks, MaxScore is formalized as the statistically-optimal (SO) one-hop routing mechanism based strictly on partial CSI—specifically, the instantaneous locations and channel gains to all neighbor nodes within a routing zone of fixed radius. The model assumes nodes are distributed per a homogeneous planar Poisson point process (PPP) of density $\lambda$ , operating under a slotted ALOHA MAC with per-slot transmission probability $p_{\mathrm{tx}}$ and Rayleigh fading. The instantaneous received rate over the link $j \rightarrow i$ is modeled as $R_{i,j}=B\log_2(1+S_{i,j}/(J_{i,j}+\sigma_v^2))$ , where $S_{i,j} = \rho r_{i,j}^{-\alpha} W_{i,j}$ is the instantaneous signal power, and $J_{i,j}$ models aggregate interference from concurrently transmitting PPP nodes.

The optimal routing decision at each transmitting node is governed by the maximization of the Asymptotic Density of Rate-Progress (ADORP), which acts as a rigorous proxy for aggregate throughput-distance. Formally, for known local CSI $\mathcal{M}_0$ , the MaxScore (SO) selects the neighbor $i^*$ maximizing

$m_{\mathrm{SO}}(i,\mathcal{M}_0) = r_{i,0}\, \mathbb{E}_{J_{i,0}|\mathcal{M}_0}\left[\log_2\left(1 + \frac{\rho\,r_{i,0}^{-\alpha}W_{i,0}}{J_{i,0} + \sigma_v^2}\right)\right],$

where $r_{i,0}$ is the distance and $p_{\mathrm{tx}}$ 0 the fading gain. Each decision is independent and optimal because routing does not influence the spatial statistics of future interferences (Richter et al., 2018).

2. MaxScore for Sparse Mixture-of-Experts: Flow Modeling and SoftTopk

In large-scale MoE models, MaxScore Routing reinterprets the token-to-expert assignment as a minimum-cost maximum-flow in a bipartite graph. Each batch consists of $p_{\mathrm{tx}}$ 1 tokens and $p_{\mathrm{tx}}$ 2 experts, each expert with fixed token capacity $p_{\mathrm{tx}}$ 3. For each token $p_{\mathrm{tx}}$ 4, affinity scores $p_{\mathrm{tx}}$ 5 for expert $p_{\mathrm{tx}}$ 6 are computed using a differentiable SoftTopk operator, which modifies the top- $p_{\mathrm{tx}}$ 7 scoring mechanism to yield smoothly assignable, balanced gradients.

The routing problem is formulated as:

$p_{\mathrm{tx}}$ 8

and $p_{\mathrm{tx}}$ 9 indicates token $j \rightarrow i$ 0 routed to expert $j \rightarrow i$ 1. The cost is $j \rightarrow i$ 2, and the constraints enforce per-token and per-expert assignment restrictions (Dong et al., 18 Aug 2025).

SoftTopk produces soft affinity distributions that are differentiable, enabling efficient training by propagating gradients to the gating mechanism.

3. Algorithms and Implementation

(a) Wireless Routing: Optimal and Suboptimal Schemes

The SO (MaxScore) routing performs, for each neighbor $j \rightarrow i$ 3 in the routing zone:

Compute $j \rightarrow i$ 4,
Numerically estimate $j \rightarrow i$ 5, typically via Monte Carlo over feasible interferences,
Score $j \rightarrow i$ 6,
Select $j \rightarrow i$ 7 (Richter et al., 2018).

Low-complexity variants—Bound-Optimal (BO), Narrow-Knowledge SO (NSO), Narrow-Knowledge BO (NBO)—replace $j \rightarrow i$ 8 by increasingly coarse deterministic bounds or lookups on interference, trading a marginal loss (≤4%) in aggregate performance for reduced computational requirements.

(b) MaxScore MoE Routing: Two-Stage Flow Assignment

MaxScore for MoE adopts a two-pass routing process:

Compute affinities $j \rightarrow i$ 9 using SoftTopk,
First-stage assignment selects the top-1 expert for each token greedily,
For residual capacity and unassigned slots, apply a Sinkhorn approximation to assign remaining top- $R_{i,j}=B\log_2(1+S_{i,j}/(J_{i,j}+\sigma_v^2))$ 0 token-expert pairs, maintaining exact or near-exact expert fill without dropping tokens,
The process is fully differentiable and optimized for tensorized GPU computation (Dong et al., 18 Aug 2025).

This approach contrasts with prior MoE routing (e.g., GShard, DropLess), which may require expert padding or token dropping, compromising either computational or model efficiency.

4. Theoretical Optimality and Analysis

In wireless networks, MaxScore routing is proven optimal for maximizing ADORP under partial CSI, given the spatial independence properties of the PPP and local-only CSI. The optimality is preserved under the specific interference-statistics-invariant property of the setting, as routing choices do not alter the global distribution of interfering transmitters. Suboptimal schemes are analytically demonstrated to remain within 4% of SO’s performance and consistently outperform traditional geographic or threshold-based greedy heuristics by 30–180% in simulated throughput (Richter et al., 2018).

In MoE networks, formulating routing as a maximum-flow optimization with SoftTopk ensures that all tokens are assigned, expert loads are balanced, and computational efficiency is maximized without recourse to token dropping or padding. These properties are achieved while maintaining hardware throughput and memory usage comparable to, or slightly below, existing approaches (Dong et al., 18 Aug 2025).

5. Empirical Performance and Trends

Simulation and training results across both domains validate the superiority of MaxScore methodologies:

Wireless Ad Hoc Networks: Low-complexity BO, NSO, and NBO routing schemes trail SO by ≤4% in ADORP, with BO nearly matching SO (≈1% loss). All MaxScore variants surpass classical geographic or nearest-neighbor routing by a substantial margin; optimal $R_{i,j}=B\log_2(1+S_{i,j}/(J_{i,j}+\sigma_v^2))$ 1 is empirically identified near 0.2 for typical parameter regimes, with small routing zones degrading performance across methods but still maintaining MaxScore’s lead (Richter et al., 2018).
Sparse MoE: On LLaMA-style Transformer architectures trained on C4 (65B tokens), MaxScore consistently achieves higher evaluation scores and faster convergence than GShard, DropLess, and DeepSeek baselines. Average accuracy gains are $R_{i,j}=B\log_2(1+S_{i,j}/(J_{i,j}+\sigma_v^2))$ 2 vs. GShard. At sparsity ratio $R_{i,j}=B\log_2(1+S_{i,j}/(J_{i,j}+\sigma_v^2))$ 3, MaxScore reaches 44.21% vs. 42.81% for GShard. MaxScore fully eliminates token dropping and achieves exact or near-exact expert utilization, with identical model FLOPs and negligible extra routing overhead (Dong et al., 18 Aug 2025).
Ablation Analysis: The combined use of SoftTopk and flow-based allocation in MaxScore demonstrates superadditive performance gains over their isolated use, confirming the necessity of both innovations for optimality (Dong et al., 18 Aug 2025).

6. Comparative Summary of MaxScore Methodologies

Domain	Routing Objective	Score Definition	Complexity	Performance Gap (vs. optimal)
Wireless Ad Hoc (SO)	Maximize ADORP	$R_{i,j}=B\log_2(1+S_{i,j}/(J_{i,j}+\sigma_v^2))$ 4	High	Optimal
Wireless (BO, NSO, NBO)	Maximize lower-bounded ADORP	Deterministic/integral bounds or lookup	Low–Medium	≤4%
MoE MaxScore	Maximize affinity allocation	$R_{i,j}=B\log_2(1+S_{i,j}/(J_{i,j}+\sigma_v^2))$ 5 (min-cost flow)	Medium	Near-optimal

7. Broader Impact and Insights

MaxScore reframes the routing problem in both wireless networking and neural architectures as an explicit optimization of a performance metric under information and resource constraints. In wireless networks, it leverages statistical independence and partial CSI to realize a distributed protocol with guaranteed throughput benefits. In MoE systems, MaxScore resolves inefficiencies due to expert capacity constraints and gradient imbalance, implementing a scalable, differentiable, and token-efficient mechanism for deep learning architectures. A plausible implication is that analogous maximum-score formulations may be transferrable to other constrained resource allocation tasks where local information or differentiable structure is essential for scalable optimization.

References:

[Optimal and Suboptimal Routing Based on Partial CSI in Random Ad-hoc Networks, (Richter et al., 2018)]
[Maximum Score Routing For Mixture-of-Experts, (Dong et al., 18 Aug 2025)]

Markdown Report Issue Upgrade to Chat

References (2)

Optimal and Suboptimal Routing Based on Partial CSI in Random Ad-hoc Networks (2018)

Maximum Score Routing For Mixture-of-Experts (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Maximum Score Routing (MaxScore).

MaxScore Routing: Optimal Wireless & MoE

1. MaxScore in Wireless Ad Hoc Networks: System Model and Score Definition

2. MaxScore for Sparse Mixture-of-Experts: Flow Modeling and SoftTopk

3. Algorithms and Implementation

(a) Wireless Routing: Optimal and Suboptimal Schemes

(b) MaxScore MoE Routing: Two-Stage Flow Assignment

4. Theoretical Optimality and Analysis

5. Empirical Performance and Trends

6. Comparative Summary of MaxScore Methodologies

7. Broader Impact and Insights

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

MaxScore Routing: Optimal Wireless & MoE

1. MaxScore in Wireless Ad Hoc Networks: System Model and Score Definition

2. MaxScore for Sparse Mixture-of-Experts: Flow Modeling and SoftTopk

3. Algorithms and Implementation

(a) Wireless Routing: Optimal and Suboptimal Schemes

(b) MaxScore MoE Routing: Two-Stage Flow Assignment

4. Theoretical Optimality and Analysis

5. Empirical Performance and Trends

6. Comparative Summary of MaxScore Methodologies

7. Broader Impact and Insights

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research