Adaptive Detection Routing (ADR) Overview

Updated 4 July 2026

Adaptive Detection Routing (ADR) is a design pattern that leverages adaptive route diversity or learned route selection to improve detection, localization, diagnosis, and recovery.
It is applied in various domains, including data-center fabrics for gray failure detection, computer vision for robust object detection, and financial anomaly detection with mechanism-specific expert routing.
ADR methods balance between operational rerouting and internal expert allocation, enhancing system sensitivity and interpretability while managing performance trade-offs.

Searching arXiv for the cited papers and closely related work on Adaptive Detection Routing. {"query":"Adaptive Detection Routing arXiv SprayCheck gray failures adaptive routing networks (Krebs et al., 5 May 2026)", "max_results": 5} Adaptive Detection Routing (ADR) denotes a family of architectures in which routing decisions are made adaptive to improve detection, localization, diagnosis, or recovery. In the recent arXiv literature, the most explicit network-systems formulation appears in "SprayCheck: Finding Gray Failures in Adaptive Routing Networks," where adaptive routing is used as a measurement signal for passive gray-failure detection in packet-spraying data-center fabrics (Krebs et al., 5 May 2026). Closely related formulations appear inside detectors themselves, where routing selects experts or modulates query interactions in object detection (Meiraz et al., 17 Nov 2025, Zhang et al., 15 Dec 2025), in machine-generated text detection under domain shift (Ma et al., 3 Nov 2025), and in financial anomaly detection via mechanism-specific expert routing (Li et al., 20 Oct 2025). The acronym is not uniform across fields: in LoRaWAN, ADR ordinarily means Adaptive Data Rate (Serati et al., 2022), while in formal methods it also denotes Architectural Design Rewriting (Poyias et al., 2013).

1. Conceptual scope and routing granularity

Across these works, the routed entity is not always a packet path. It may instead be an expert branch, a query-to-query interaction, or a reconfiguration trajectory. This suggests that ADR is best understood as a design pattern in which route diversity or route selection becomes an instrument for detection, rather than as a single standardized protocol (Krebs et al., 5 May 2026, Meiraz et al., 17 Nov 2025, Ma et al., 3 Nov 2025, Li et al., 20 Oct 2025).

Setting	Routed entity	Detection objective
Adaptive-routing data-center fabric	sprayed traffic across spines	gray failure detection and localization
YOLOv9-T Mixture-of-Experts	expert branches at three scales	robust object detection
DETR decoder	pairwise query interactions	reduce redundant query competition
Machine-generated text detection	source-domain experts plus shared experts	domain-general MGT detection
Financial anomaly detection	four mechanism-specific experts	mechanism attribution and early warning

A recurrent distinction is between routing as an operational forwarding decision and routing as a learned internal allocation mechanism. SprayCheck belongs to the former category only indirectly: it does not reroute packets in the forwarding plane, but uses adaptive spraying behavior so that the control plane can reroute after failure localization. The detector-oriented works belong to the latter category: routing governs which experts or interactions are emphasized for a given input.

2. Adaptive-routing networks as observability substrates

In packet-spraying fabrics for distributed ML training, gray failures are difficult because they do not fully break a link or switch. A link may silently drop a small fraction of packets while still appearing healthy to the control plane, yet even a small loss can propagate into application slowdown in bulk-synchronous workloads. SprayCheck exploits a distinctive property of adaptive routing: in a failure-free symmetric 2-level fat tree, a large flow should be spread evenly across candidate spines in expectation, so missing packet mass from one spine becomes evidence of path loss (Krebs et al., 5 May 2026).

Its pipeline is passive and in-network. At the start of a collective, the collective library sends a flow-announcement packet carrying the flow size $N$ , queue-pair or identifier information, and metadata sufficient for flow identification; the paper reports an overhead of about $17$ bytes per flow. Each source leaf isolates one cross-leaf measurement flow at a time and prioritizes it at the highest priority queue so that competing traffic does not perturb its spraying distribution. The destination leaf then counts how many marked packets arrive from each spine. For $N$ packets spread over $k$ spines, the expected healthy count per spine is

$\lambda = \mathbb{E}[X_i] = \frac{N}{k},$

with variance modeled approximately as $\sigma^2 \approx \lambda$ for large flows. A gray failure on spine $j$ with drop rate $p$ reduces the expected count to $\lambda(1-p)$ , and SprayCheck applies a one-sided $Z$ -test with threshold

$17$0

If the observed count falls below $17$1, the path is flagged.

Localization proceeds by intersection of path reports. A single failing measurement identifies a path through a spine, which maps to two candidate links, source leaf $17$2 spine and spine $17$3 destination leaf. By combining multiple reports, the central monitoring system infers the common failed link and then triggers a routing-table update. The system therefore complements rather than replaces fast rerouting: detection and localization are the front end, and rerouting remains a control-plane action.

The reported evaluation is unusually specific. In a $17$4-spine topology, SprayCheck detects and localizes a $17$5 single-link packet-drop rate within one training iteration of Llama-3 70B and a $17$6 single-link packet-drop rate within $17$7 training iterations. In the $17$8-spine testbed, the calibration study reports perfect accuracy for drop rates $17$9 on a single link with a $N$ 0k-packet measurement flow, as well as $N$ 1 false negatives and $N$ 2 false positives in robustness studies. The method is also reported to have negligible performance impact from prioritizing one measured flow: in a $N$ 3-spine congested scenario, the prioritized flow sped up by only $N$ 4 and other flows slowed by $N$ 5.

3. Detector-internal routing in computer vision

In object detection, ADR refers to learned routing inside the detector rather than to packet forwarding. "YOLO Meets Mixture-of-Experts: Adaptive Expert Routing for Robust Object Detection" embeds a Mixture-of-Experts mechanism inside YOLOv9-T. The detector uses multiple YOLOv9-T experts in parallel, with experiments specifically using two experts, and a scale-specific router produces soft weights $N$ 6 from expert features and a Hadamard-fusion interaction term

$N$ 7

The fused logits are computed as

$N$ 8

The model is trained end-to-end because the detection loss is computed before NMS, and a load-balancing term discourages collapse onto a single expert. Quantitatively, the reported MoE-T model trained on COCO + VisDrone reaches $N$ 9 mAP@0.5:0.95 and $k$ 0 AR on COCO test, and $k$ 1 [email protected]:0.95 and $k$ 2 AR on VisDrone test, improving over the reported single YOLOv9-T baselines under the same dataset conditions (Meiraz et al., 17 Nov 2025).

"Route-DETR: Pairwise Query Routing in Transformers for Object Detection" applies routing at a finer granularity: decoder query pairs. Its premise is that many DETR queries converge toward the same object, producing redundant refinement under one-to-one matching. Route-DETR therefore distinguishes competing from complementary queries using inter-query similarity, confidence scores, and geometry. It introduces suppressor routes, which contribute negative attention bias for competing queries, and delegator routes, which contribute positive attention bias for complementary queries. The routed self-attention is implemented by adding a learned pairwise bias matrix $k$ 3 to the decoder self-attention logits, and the routing biases are used only during training through a dual-branch strategy, preserving standard inference efficiency. The reported gains include $k$ 4 mAP for DINO with RN-50 under a one-to-many training strategy, a $k$ 5 mAP gain over DINO on ResNet-50, and $k$ 6 mAP on Swin-L. The ablation study attributes $k$ 7 mAP to suppressor-only routing, $k$ 8 to delegator-only routing, and $k$ 9 when both are used together (Zhang et al., 15 Dec 2025).

These vision models also clarify a common misconception: routing need not be a hard path selection. In both systems, routing is differentiable and sample-adaptive. In the YOLO MoE detector it is soft expert weighting; in Route-DETR it is pairwise attention biasing. The routed entity is thus an internal computational pathway.

4. Instance-adaptive routing for machine-generated text detection

In machine-generated text detection, the principal ADR problem is domain shift. DEER, the "Disentangled mixturE-of-ExpeRts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection," separates expert specialization from expert selection. During the first stage, domain-specific experts are trained only on samples from their corresponding source domains, while domain-crossed or shared experts are trained on all source domains. For an encoded sample $\lambda = \mathbb{E}[X_i] = \frac{N}{k},$ 0, the domain-aware gate produces weights $\lambda = \mathbb{E}[X_i] = \frac{N}{k},$ 1 and the fused representation

$\lambda = \mathbb{E}[X_i] = \frac{N}{k},$ 2

followed by classification (Ma et al., 3 Nov 2025).

The second stage addresses the train-inference gap created by unavailable domain labels at test time. DEER formulates routing as a reinforcement-learning policy over source-domain expert groups, with state equal to the final-layer hidden representation and policy

$\lambda = \mathbb{E}[X_i] = \frac{N}{k},$ 3

At inference, the model keeps the top- $\lambda = \mathbb{E}[X_i] = \frac{N}{k},$ 4 domains with the highest routing probabilities, activates the corresponding domain-specific experts together with the shared experts, and computes

$\lambda = \mathbb{E}[X_i] = \frac{N}{k},$ 5

The reported experimental regime uses $\lambda = \mathbb{E}[X_i] = \frac{N}{k},$ 6 domains from the MAGE benchmark, with five source domains for training and five unseen domains for out-of-domain evaluation, RoBERTa-base as backbone, $\lambda = \mathbb{E}[X_i] = \frac{N}{k},$ 7 domain-specific experts per domain, and $\lambda = \mathbb{E}[X_i] = \frac{N}{k},$ 8 shared experts. DEER reports average improvements of $\lambda = \mathbb{E}[X_i] = \frac{N}{k},$ 9 F1 and $\sigma^2 \approx \lambda$ 0 accuracy on in-domain datasets, and $\sigma^2 \approx \lambda$ 1 F1 and $\sigma^2 \approx \lambda$ 2 accuracy on out-of-domain datasets. On the DG-MGT average, the reported score is $\sigma^2 \approx \lambda$ 3 accuracy and $\sigma^2 \approx \lambda$ 4 F1. The ablations are especially important conceptually: removing domain-specific experts or shared experts degrades performance, and the RL-based routing strategy outperforms random routing, classifier-based domain inference, and the other reported inference-time alternatives. This establishes ADR here as label-free, instance-wise expert selection rather than static domain assignment.

5. Mechanism-specific expert routing in financial anomaly detection

A different ADR formulation appears in "Explainable Heterogeneous Anomaly Detection in Financial Networks via Adaptive Expert Routing," where routing is both a detection mechanism and an explanation. The system assumes that financial anomalies arise from heterogeneous mechanisms rather than a single undifferentiated abnormality score, and it routes each stock-time instance across four experts: Price-Shock, Liquidity, Systemic-Contagion, and Momentum-Reversal (Li et al., 20 Oct 2025).

The architecture has four modules. First, spatial-temporal encoding combines a BiLSTM, multi-head self-attention, a GCN over a prior graph, and cross-modal attention between temporal and spatial embeddings. Second, neural dynamic graph learning interpolates temporal similarity, context similarity, and domain knowledge graphs, then fuses the learned and prior graphs with a stress-modulated coefficient

$\sigma^2 \approx \lambda$ 5

Third, the gating network concatenates the final representation, global context, market stress, and category-level feature summaries, and produces routing weights

$\sigma^2 \approx \lambda$ 6

Each expert reconstructs its mechanism-specific feature block, yielding reconstruction errors $\sigma^2 \approx \lambda$ 7, and the mixture error is

$\sigma^2 \approx \lambda$ 8

Fourth, multi-scale GRU decoders reconstruct short horizons $\sigma^2 \approx \lambda$ 9, and the final anomaly score blends MoE error with multi-scale reconstruction error.

The interpretability claim is architectural, not post-hoc. The routing vector itself serves as a mechanism-level explanation, and its trajectory over time provides temporal evolution tracking. The reported dataset covers $j$ 0 US equities from $j$ 1 to $j$ 2, with $j$ 3 features over $j$ 4-day rolling windows and $j$ 5 documented market events. The detailed extraction reports a $j$ 6 detection rate, corresponding to $j$ 7 events, with a $j$ 8-day median lead time; it also reports strongest baselines at $j$ 9. The abstract, by contrast, states that the method outperforms the best baseline by $p$ 0 percentage points. In the Silicon Valley Bank case study, the Price-Shock expert weight rises from a baseline of $p$ 1 to $p$ 2 during closure and peaks at $p$ 3 one week later, while the Systemic-Contagion expert remains around $p$ 4, indicating spillover without early dominance.

6. Adjacent meanings, antecedents, and limitations

The acronym ADR is materially overloaded. In LoRaWAN, ADR means Adaptive Data Rate rather than Adaptive Detection Routing. "ADR-Lite: A Low-Complexity Adaptive Data Rate Scheme for the LoRa Network" addresses centralized link adaptation at the network server by replacing the standard history-dependent packet-window logic with a binary-search-like decision rule over sorted transmission-parameter configurations $p$ 5. The scheme tracks the current configuration index and the last received packet configuration, avoids storing the history of the last received packets, and is therefore explicitly low in space complexity. In the reported mobile scenario with high channel noise, its Packet Delivery Ratio is $p$ 6 times that of the original ADR and $p$ 7 times that of other relevant algorithms. The paper also notes an important realism caveat: while $p$ 8 is adjustable in simulation, this may not always be realistic in actual deployments (Serati et al., 2022).

Related adaptive-routing work provides algorithmic background without being detection-specific. "Adaptive routing protocols for determining optimal paths in AI multi-agent systems" proposes APBDA, an adaptive priority-based Dijkstra’s algorithm whose edge cost combines task complexity, user priority, agent capability, availability, bandwidth, latency, load, model sophistication, and reliability, with weights later adapted via reinforcement learning (Panayotov et al., 10 Mar 2025). "Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks" proposes PARN, which decouples routing from scheduling through shadow queues and a probabilistic routing table derived from shadow traffic rates, while preserving throughput-optimal stability claims and reducing queue-management complexity (Athanasopoulou et al., 2010). These systems are not ADR in the detection sense, but they illustrate the broader adaptive-routing mechanisms from which detection-oriented designs borrow.

A further source of ambiguity is formal methods, where ADR denotes Architectural Design Rewriting. "On Recovering from Run-time Misbehaviour in ADR" extends that formalism with a tracking environment and a forest of derivation trees that record which production created each architectural element. The monitoring tree is then used to localize the part of the system affected by unexpected run-time behaviour and to suggest reconfigurations through weakest-precondition reasoning (Poyias et al., 2013). Here, routing is not traffic steering or expert selection, but monitored structural evolution.

Several limitations recur across the ADR literature. Fabric-level ADR complements rather than subsumes rerouting, since detection and localization are separated from mitigation. MoE-style detectors require explicit regularization against expert collapse, as seen in load-balancing and entropy terms. Some methods deliberately separate training-time and inference-time routing, as in Route-DETR’s auxiliary routed branch, which improves optimization while adding no inference cost. Other realism constraints are domain-specific: the YOLO MoE paper does not report latency, FLOPs, or parameter count; the financial anomaly model assumes a fixed four-expert taxonomy and does not include explicit causal or directed contagion modeling; and DEER notes that unseen domains motivate label-free adaptive routing precisely because fixed domain labels are unavailable at deployment. Taken together, these works indicate that ADR is not a single algorithmic recipe but a recurring strategy: exploit adaptive route diversity, or learn adaptive route selection, so that detection becomes more sensitive, more specialized, or more interpretable.