Papers
Topics
Authors
Recent
Search
2000 character limit reached

Adaptive Detection Routing (ADR) Overview

Updated 4 July 2026
  • Adaptive Detection Routing (ADR) is a design pattern that leverages adaptive route diversity or learned route selection to improve detection, localization, diagnosis, and recovery.
  • It is applied in various domains, including data-center fabrics for gray failure detection, computer vision for robust object detection, and financial anomaly detection with mechanism-specific expert routing.
  • ADR methods balance between operational rerouting and internal expert allocation, enhancing system sensitivity and interpretability while managing performance trade-offs.

Searching arXiv for the cited papers and closely related work on Adaptive Detection Routing. {"query":"Adaptive Detection Routing arXiv SprayCheck gray failures adaptive routing networks (Krebs et al., 5 May 2026)", "max_results": 5} Adaptive Detection Routing (ADR) denotes a family of architectures in which routing decisions are made adaptive to improve detection, localization, diagnosis, or recovery. In the recent arXiv literature, the most explicit network-systems formulation appears in "SprayCheck: Finding Gray Failures in Adaptive Routing Networks," where adaptive routing is used as a measurement signal for passive gray-failure detection in packet-spraying data-center fabrics (Krebs et al., 5 May 2026). Closely related formulations appear inside detectors themselves, where routing selects experts or modulates query interactions in object detection (Meiraz et al., 17 Nov 2025, Zhang et al., 15 Dec 2025), in machine-generated text detection under domain shift (Ma et al., 3 Nov 2025), and in financial anomaly detection via mechanism-specific expert routing (Li et al., 20 Oct 2025). The acronym is not uniform across fields: in LoRaWAN, ADR ordinarily means Adaptive Data Rate (Serati et al., 2022), while in formal methods it also denotes Architectural Design Rewriting (Poyias et al., 2013).

1. Conceptual scope and routing granularity

Across these works, the routed entity is not always a packet path. It may instead be an expert branch, a query-to-query interaction, or a reconfiguration trajectory. This suggests that ADR is best understood as a design pattern in which route diversity or route selection becomes an instrument for detection, rather than as a single standardized protocol (Krebs et al., 5 May 2026, Meiraz et al., 17 Nov 2025, Ma et al., 3 Nov 2025, Li et al., 20 Oct 2025).

Setting Routed entity Detection objective
Adaptive-routing data-center fabric sprayed traffic across spines gray failure detection and localization
YOLOv9-T Mixture-of-Experts expert branches at three scales robust object detection
DETR decoder pairwise query interactions reduce redundant query competition
Machine-generated text detection source-domain experts plus shared experts domain-general MGT detection
Financial anomaly detection four mechanism-specific experts mechanism attribution and early warning

A recurrent distinction is between routing as an operational forwarding decision and routing as a learned internal allocation mechanism. SprayCheck belongs to the former category only indirectly: it does not reroute packets in the forwarding plane, but uses adaptive spraying behavior so that the control plane can reroute after failure localization. The detector-oriented works belong to the latter category: routing governs which experts or interactions are emphasized for a given input.

2. Adaptive-routing networks as observability substrates

In packet-spraying fabrics for distributed ML training, gray failures are difficult because they do not fully break a link or switch. A link may silently drop a small fraction of packets while still appearing healthy to the control plane, yet even a small loss can propagate into application slowdown in bulk-synchronous workloads. SprayCheck exploits a distinctive property of adaptive routing: in a failure-free symmetric 2-level fat tree, a large flow should be spread evenly across candidate spines in expectation, so missing packet mass from one spine becomes evidence of path loss (Krebs et al., 5 May 2026).

Its pipeline is passive and in-network. At the start of a collective, the collective library sends a flow-announcement packet carrying the flow size NN, queue-pair or identifier information, and metadata sufficient for flow identification; the paper reports an overhead of about $17$ bytes per flow. Each source leaf isolates one cross-leaf measurement flow at a time and prioritizes it at the highest priority queue so that competing traffic does not perturb its spraying distribution. The destination leaf then counts how many marked packets arrive from each spine. For NN packets spread over kk spines, the expected healthy count per spine is

λ=E[Xi]=Nk,\lambda = \mathbb{E}[X_i] = \frac{N}{k},

with variance modeled approximately as σ2≈λ\sigma^2 \approx \lambda for large flows. A gray failure on spine jj with drop rate pp reduces the expected count to λ(1−p)\lambda(1-p), and SprayCheck applies a one-sided ZZ-test with threshold

$17$0

If the observed count falls below $17$1, the path is flagged.

Localization proceeds by intersection of path reports. A single failing measurement identifies a path through a spine, which maps to two candidate links, source leaf $17$2 spine and spine $17$3 destination leaf. By combining multiple reports, the central monitoring system infers the common failed link and then triggers a routing-table update. The system therefore complements rather than replaces fast rerouting: detection and localization are the front end, and rerouting remains a control-plane action.

The reported evaluation is unusually specific. In a $17$4-spine topology, SprayCheck detects and localizes a $17$5 single-link packet-drop rate within one training iteration of Llama-3 70B and a $17$6 single-link packet-drop rate within $17$7 training iterations. In the $17$8-spine testbed, the calibration study reports perfect accuracy for drop rates $17$9 on a single link with a NN0k-packet measurement flow, as well as NN1 false negatives and NN2 false positives in robustness studies. The method is also reported to have negligible performance impact from prioritizing one measured flow: in a NN3-spine congested scenario, the prioritized flow sped up by only NN4 and other flows slowed by NN5.

3. Detector-internal routing in computer vision

In object detection, ADR refers to learned routing inside the detector rather than to packet forwarding. "YOLO Meets Mixture-of-Experts: Adaptive Expert Routing for Robust Object Detection" embeds a Mixture-of-Experts mechanism inside YOLOv9-T. The detector uses multiple YOLOv9-T experts in parallel, with experiments specifically using two experts, and a scale-specific router produces soft weights NN6 from expert features and a Hadamard-fusion interaction term

NN7

The fused logits are computed as

NN8

The model is trained end-to-end because the detection loss is computed before NMS, and a load-balancing term discourages collapse onto a single expert. Quantitatively, the reported MoE-T model trained on COCO + VisDrone reaches NN9 mAP@0.5:0.95 and kk0 AR on COCO test, and kk1 [email protected]:0.95 and kk2 AR on VisDrone test, improving over the reported single YOLOv9-T baselines under the same dataset conditions (Meiraz et al., 17 Nov 2025).

"Route-DETR: Pairwise Query Routing in Transformers for Object Detection" applies routing at a finer granularity: decoder query pairs. Its premise is that many DETR queries converge toward the same object, producing redundant refinement under one-to-one matching. Route-DETR therefore distinguishes competing from complementary queries using inter-query similarity, confidence scores, and geometry. It introduces suppressor routes, which contribute negative attention bias for competing queries, and delegator routes, which contribute positive attention bias for complementary queries. The routed self-attention is implemented by adding a learned pairwise bias matrix kk3 to the decoder self-attention logits, and the routing biases are used only during training through a dual-branch strategy, preserving standard inference efficiency. The reported gains include kk4 mAP for DINO with RN-50 under a one-to-many training strategy, a kk5 mAP gain over DINO on ResNet-50, and kk6 mAP on Swin-L. The ablation study attributes kk7 mAP to suppressor-only routing, kk8 to delegator-only routing, and kk9 when both are used together (Zhang et al., 15 Dec 2025).

These vision models also clarify a common misconception: routing need not be a hard path selection. In both systems, routing is differentiable and sample-adaptive. In the YOLO MoE detector it is soft expert weighting; in Route-DETR it is pairwise attention biasing. The routed entity is thus an internal computational pathway.

4. Instance-adaptive routing for machine-generated text detection

In machine-generated text detection, the principal ADR problem is domain shift. DEER, the "Disentangled mixturE-of-ExpeRts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection," separates expert specialization from expert selection. During the first stage, domain-specific experts are trained only on samples from their corresponding source domains, while domain-crossed or shared experts are trained on all source domains. For an encoded sample λ=E[Xi]=Nk,\lambda = \mathbb{E}[X_i] = \frac{N}{k},0, the domain-aware gate produces weights λ=E[Xi]=Nk,\lambda = \mathbb{E}[X_i] = \frac{N}{k},1 and the fused representation

λ=E[Xi]=Nk,\lambda = \mathbb{E}[X_i] = \frac{N}{k},2

followed by classification (Ma et al., 3 Nov 2025).

The second stage addresses the train-inference gap created by unavailable domain labels at test time. DEER formulates routing as a reinforcement-learning policy over source-domain expert groups, with state equal to the final-layer hidden representation and policy

λ=E[Xi]=Nk,\lambda = \mathbb{E}[X_i] = \frac{N}{k},3

At inference, the model keeps the top-λ=E[Xi]=Nk,\lambda = \mathbb{E}[X_i] = \frac{N}{k},4 domains with the highest routing probabilities, activates the corresponding domain-specific experts together with the shared experts, and computes

λ=E[Xi]=Nk,\lambda = \mathbb{E}[X_i] = \frac{N}{k},5

The reported experimental regime uses λ=E[Xi]=Nk,\lambda = \mathbb{E}[X_i] = \frac{N}{k},6 domains from the MAGE benchmark, with five source domains for training and five unseen domains for out-of-domain evaluation, RoBERTa-base as backbone, λ=E[Xi]=Nk,\lambda = \mathbb{E}[X_i] = \frac{N}{k},7 domain-specific experts per domain, and λ=E[Xi]=Nk,\lambda = \mathbb{E}[X_i] = \frac{N}{k},8 shared experts. DEER reports average improvements of λ=E[Xi]=Nk,\lambda = \mathbb{E}[X_i] = \frac{N}{k},9 F1 and σ2≈λ\sigma^2 \approx \lambda0 accuracy on in-domain datasets, and σ2≈λ\sigma^2 \approx \lambda1 F1 and σ2≈λ\sigma^2 \approx \lambda2 accuracy on out-of-domain datasets. On the DG-MGT average, the reported score is σ2≈λ\sigma^2 \approx \lambda3 accuracy and σ2≈λ\sigma^2 \approx \lambda4 F1. The ablations are especially important conceptually: removing domain-specific experts or shared experts degrades performance, and the RL-based routing strategy outperforms random routing, classifier-based domain inference, and the other reported inference-time alternatives. This establishes ADR here as label-free, instance-wise expert selection rather than static domain assignment.

5. Mechanism-specific expert routing in financial anomaly detection

A different ADR formulation appears in "Explainable Heterogeneous Anomaly Detection in Financial Networks via Adaptive Expert Routing," where routing is both a detection mechanism and an explanation. The system assumes that financial anomalies arise from heterogeneous mechanisms rather than a single undifferentiated abnormality score, and it routes each stock-time instance across four experts: Price-Shock, Liquidity, Systemic-Contagion, and Momentum-Reversal (Li et al., 20 Oct 2025).

The architecture has four modules. First, spatial-temporal encoding combines a BiLSTM, multi-head self-attention, a GCN over a prior graph, and cross-modal attention between temporal and spatial embeddings. Second, neural dynamic graph learning interpolates temporal similarity, context similarity, and domain knowledge graphs, then fuses the learned and prior graphs with a stress-modulated coefficient

σ2≈λ\sigma^2 \approx \lambda5

Third, the gating network concatenates the final representation, global context, market stress, and category-level feature summaries, and produces routing weights

σ2≈λ\sigma^2 \approx \lambda6

Each expert reconstructs its mechanism-specific feature block, yielding reconstruction errors σ2≈λ\sigma^2 \approx \lambda7, and the mixture error is

σ2≈λ\sigma^2 \approx \lambda8

Fourth, multi-scale GRU decoders reconstruct short horizons σ2≈λ\sigma^2 \approx \lambda9, and the final anomaly score blends MoE error with multi-scale reconstruction error.

The interpretability claim is architectural, not post-hoc. The routing vector itself serves as a mechanism-level explanation, and its trajectory over time provides temporal evolution tracking. The reported dataset covers jj0 US equities from jj1 to jj2, with jj3 features over jj4-day rolling windows and jj5 documented market events. The detailed extraction reports a jj6 detection rate, corresponding to jj7 events, with a jj8-day median lead time; it also reports strongest baselines at jj9. The abstract, by contrast, states that the method outperforms the best baseline by pp0 percentage points. In the Silicon Valley Bank case study, the Price-Shock expert weight rises from a baseline of pp1 to pp2 during closure and peaks at pp3 one week later, while the Systemic-Contagion expert remains around pp4, indicating spillover without early dominance.

6. Adjacent meanings, antecedents, and limitations

The acronym ADR is materially overloaded. In LoRaWAN, ADR means Adaptive Data Rate rather than Adaptive Detection Routing. "ADR-Lite: A Low-Complexity Adaptive Data Rate Scheme for the LoRa Network" addresses centralized link adaptation at the network server by replacing the standard history-dependent packet-window logic with a binary-search-like decision rule over sorted transmission-parameter configurations pp5. The scheme tracks the current configuration index and the last received packet configuration, avoids storing the history of the last received packets, and is therefore explicitly low in space complexity. In the reported mobile scenario with high channel noise, its Packet Delivery Ratio is pp6 times that of the original ADR and pp7 times that of other relevant algorithms. The paper also notes an important realism caveat: while pp8 is adjustable in simulation, this may not always be realistic in actual deployments (Serati et al., 2022).

Related adaptive-routing work provides algorithmic background without being detection-specific. "Adaptive routing protocols for determining optimal paths in AI multi-agent systems" proposes APBDA, an adaptive priority-based Dijkstra’s algorithm whose edge cost combines task complexity, user priority, agent capability, availability, bandwidth, latency, load, model sophistication, and reliability, with weights later adapted via reinforcement learning (Panayotov et al., 10 Mar 2025). "Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks" proposes PARN, which decouples routing from scheduling through shadow queues and a probabilistic routing table derived from shadow traffic rates, while preserving throughput-optimal stability claims and reducing queue-management complexity (Athanasopoulou et al., 2010). These systems are not ADR in the detection sense, but they illustrate the broader adaptive-routing mechanisms from which detection-oriented designs borrow.

A further source of ambiguity is formal methods, where ADR denotes Architectural Design Rewriting. "On Recovering from Run-time Misbehaviour in ADR" extends that formalism with a tracking environment and a forest of derivation trees that record which production created each architectural element. The monitoring tree is then used to localize the part of the system affected by unexpected run-time behaviour and to suggest reconfigurations through weakest-precondition reasoning (Poyias et al., 2013). Here, routing is not traffic steering or expert selection, but monitored structural evolution.

Several limitations recur across the ADR literature. Fabric-level ADR complements rather than subsumes rerouting, since detection and localization are separated from mitigation. MoE-style detectors require explicit regularization against expert collapse, as seen in load-balancing and entropy terms. Some methods deliberately separate training-time and inference-time routing, as in Route-DETR’s auxiliary routed branch, which improves optimization while adding no inference cost. Other realism constraints are domain-specific: the YOLO MoE paper does not report latency, FLOPs, or parameter count; the financial anomaly model assumes a fixed four-expert taxonomy and does not include explicit causal or directed contagion modeling; and DEER notes that unseen domains motivate label-free adaptive routing precisely because fixed domain labels are unavailable at deployment. Taken together, these works indicate that ADR is not a single algorithmic recipe but a recurring strategy: exploit adaptive route diversity, or learn adaptive route selection, so that detection becomes more sensitive, more specialized, or more interpretable.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptive Detection Routing (ADR).