BiRouter: Dual-Criteria Routing Framework

Updated 7 December 2025

BiRouter is a dual-criteria routing framework that integrates two distinct metrics, ImpScore and GapScore, for efficient decentralized routing in both multi-agent systems and hardware architectures.
It employs a hybrid decision strategy that uses learned neural modules and a reputation mechanism to balance long-term goal relevance with local contextual alignment, ensuring robust task delegation.
BiRouter facilitates efficient two-dimensional policy routing in hardware, achieving significant TCAM savings and low latency, while supporting scalable updates with a colored-tree structure.

BiRouter denotes a class of dual-criteria or two-dimensional routing frameworks that leverage the interplay between multiple routing metrics to achieve decentralized, robust, and efficient next-hop selection. Prominent instantiations of BiRouter include: (i) hybrid decision strategies in Multi-Agent Systems (MAS) for collaborative task delegation (Yang et al., 30 Nov 2025); and (ii) hardware routers capable of forwarding traffic based on two address dimensions, such as source and destination (see TwoD Router/FIST) (Yang et al., 2019). Both domains exhibit a central theme: balancing orthogonal criteria for efficient and scalable routing in complex, decentralized environments.

1. Dual-Criteria Routing in Decentralized Multi-Agent Systems

In open, decentralized MAS, agents lack a global view, precluding traditional centralized or static routing and planning strategies. Each agent’s local decision—forwarding a task or message to a successor—must simultaneously optimize for:

Long-term goal relevance (“Importance Score”, ImpScore): quantifies an agent’s overall criticality for the task.
Local contextual alignment (“Gap Score”, GapScore): measures the smoothness of integrating the candidate agent given the current chain history.

BiRouter introduces a lightweight, locally executed routing heuristic that synthesizes these two criteria using learned neural modules. The decision rule for selection is a convex combination of ImpScore and GapScore, further modulated by a dynamically updated reputation (credit) score to downweight unreliable agents: $\mathrm{Logits}_{i+1} = S^{\mathrm{crd}}_{i+1} \odot \left[\,\alpha S^{\mathrm{Imp}}_{i+1} + (1-\alpha) S^{\mathrm{Gap}}_{i+1}\right]$ The local policy is then chosen by softmax over candidate logits. See (Yang et al., 30 Nov 2025) for explicit definitions:

$\mathrm{ImpScore}: \mathcal{B} \times \mathcal{Q} \to \mathbb{R}^{n \times |\mathcal{Q}|}$ , based on agent-task rankings.
$\mathrm{GapScore}(a_i, k) = \frac{1}{i-k+1}$ for $i \geq k$ , $0$ otherwise.

2. Reputation Mechanisms for Robustness

To mitigate the impact of untrustworthy or adversarial agents, BiRouter incorporates a reputation system:

Each agent maintains a positive credit score $S^{\mathrm{crd}}(a_j)$ .
Upon task completion, a global (but decentralized) LLM-based evaluator assigns multiplicative update factors $f_j$ reflecting agent reliability.
The credit score is updated as $S^{\mathrm{crd}}(a_j) \leftarrow S^{\mathrm{crd}}(a_j) \cdot f_j$ .
This gating sharpens the resilience of the agent network, ensuring that unreliable agents are systematically deprioritized in routing decisions (Yang et al., 30 Nov 2025).

Empirical results confirm that disabling the reputation gate causes performance to degrade sharply in untrustworthy settings (e.g., GSM8K: 92.31% to 84.79%), while credit updates restrict performance drops to minimal levels (94.09%→92.37%).

3. Large-Scale Cross-Domain Training and Generalization

The BiRouter MAS model leverages the MARS dataset:

115 curated domains with thousands of multi-step queries.
Annotated agent chains supporting both ImpScore and GapScore training.
Enriched low-density regions via a radial basis function—ensuring semantic diversity and stronger generalization (Yang et al., 30 Nov 2025).

Experimental evidence demonstrates state-of-the-art performance in both centralized (full visibility) and decentralized (local successor set) settings, with higher accuracy and lower token consumption compared to DyLAN and MaAS baselines.

Setting	Accuracy (%)	Token Usage (M)
Centralized Single-Agent	84.88	-
Centralized Static MAS	87.04	-
Centralized Dynamic MAS	87.21	-
Centralized BiRouter	91.73	2.8
Decentralized DyLAN	87.95	6.3
Decentralized MaAS	86.43	3.8
Decentralized BiRouter	91.99	2.8

4. Two-Dimensional (Policy) Routing in Hardware: The TwoD Router and FIST

As another major instantiation, BiRouter systems are realized in hardware packet routers with two-dimensional forwarding logic, typically on {destination, source} tuples (Yang et al., 2019):

Pipeline: Separate TCAM matches for destination and source prefixes, outputting indices $r$ and $c$ .
Forwarding Table: SRAM-based $N \times M$ “TD-table” indexed by $(r,c)$ , referencing a final mapping table for the next-hop/interface.
Redundancy Management: By separating TCAM entries for each dimension (rather than naïvely encoding all $(p_d,p_s)$ pairs), the design achieves $O(N+M)$ TCAM bits and $O(NM\log P)$ SRAM bits.

Incremental update algorithms are enabled using a colored-tree structure, minimizing memory writes to the subdomain affected by each rule change.

5. Practical Performance and Scalability

Hardware Throughput: Full line rate (e.g., 4×1 Gbps) is maintained, with FIST’s lookup path incurring one TCAM and two SRAM cycles—effectively $O(1)$ lookup latency.
Memory Efficiency: Redundancy elimination produces up to 99% TCAM savings compared to ACL-style two-dimensional tables. Typical implementations can be achieved with modest (200MB-scale) SRAM (Yang et al., 2019).
Update Latency: Sublinear memory rewrites per update event. ACL-style would require thousands of SRAM writes per update; the FIST BiRouter typically needs only 100–600 writes.
Adoption: The architecture is amenable to deployment on existing ASIC/FPGA hardware with incremental changes.

Data Plane	Lookup Latency	TCAM Use	Update Writes
ACL-style	O(1)	O(N×M)	O(N×M)
FIST BiRouter	O(1)	O(N+M)	O(

6. Theoretical Complexity and Open Challenges

For MAS BiRouter, per-hop complexity is $O(|\mathcal{C}| \cdot d^2)$ , dominated by two cross-attention neural passes. Network size $n$ only affects initial discovery, not runtime cost per hop. Open challenges include:

Hyperparameter $\alpha$ selection for ImpScore/GapScore tradeoff.
Coverage and quality of synthetic data (MARS) affecting learned heuristics.
Automated or adversarially robust alternatives to LLM-based reputation updates.
Extending to more complex topologies, such as branching or looped agent chains.

For the hardware BiRouter, the primary scalability limit is the quadratic growth of SRAM table size; mitigations include policy coalescing, deduplication, and the use of “policy classes.” Hardware adjustments are bounded within typical line-card budgets for modern routers but are sensitive to the cardinalities $N$ and $M$ .

7. Domain Impact and Generalization

The BiRouter framework establishes a systematic approach to routing and task delegation in environments where decisions must be decentralized and depend on multiple, often orthogonal, criteria. Its principles—metric separation, hybridization, and modular gating—are extensible to a broad range of systems, including both algorithmic MAS and high-speed data plane hardware.

BiRouter achieves statistically significant improvements in both performance accuracy and resource efficiency in LLM-based MAS settings (Yang et al., 30 Nov 2025), and it enables scalable, flexible two-dimensional policy routing in network hardware deployments (Yang et al., 2019). These results position BiRouter as a unifying architectural and methodological construct for dual-criteria routing across software and hardware domains.