USD: Unknown Sensitive Detector
- USD is a system that automatically detects entities outside pre-labeled categories, enabling robust anomaly and novelty detection across multiple domains.
- It integrates decoupled objectness learning with auxiliary supervision like SAM, effectively flagging unknown objects in computer vision and wireless environments.
- Innovative methods such as zero-shot RF embedding, SVD-enhanced clustering, and causality inference underlie USD’s state-of-the-art performance metrics.
An Unknown Sensitive Detector (USD) is a system or methodology for automatically detecting entities—often objects, sensors, or emitters—that are “unknown” in the sense that they are not encompassed by the set of labeled or previously encountered categories during training or configuration. USDs have been concretely developed and evaluated in three distinct technical contexts: open-world object detection (OWOD) in computer vision, clandestine wireless sensor and emitter detection in the physical layer, and zero-shot RF device fingerprinting under self-supervised learning. Recent works detail precise system architectures, methodological advances, and evaluation metrics, establishing USD as an emerging paradigm for robust anomaly, novelty, and outlier detection where ground-truth labels for the entire search space are unfeasible (He et al., 2023, Singh et al., 2020, Krasnov et al., 10 Nov 2025).
1. Conceptual Frameworks and Technical Definitions
In OWOD, a USD is an object detection model capable of not only recognizing known object classes but also flagging regions or entities in an image that do not match any known class—a critical capability for deployment in unconstrained real-world environments (He et al., 2023). In wireless security and privacy, USD systems focus on identifying, classifying, and localizing wireless sensors or RF emitters that are not listed or labeled in a reference inventory, particularly those surreptitiously monitoring a person or environment (Singh et al., 2020, Krasnov et al., 10 Nov 2025).
A canonical USD is characterized by:
- Formally distinguishing between “known” (in-distribution) and “unknown” (out-of-distribution) entities.
- Operating with weak, inferred, or zero manual supervision regarding unknowns.
- Employing auxiliary, side-channel, or large-model assistance to extend or decouple detection boundaries beyond traditional supervised learning.
2. Key Architectures and Algorithmic Innovations
Open-World Object Detection (OWOD)
The USD architecture in (He et al., 2023) is built atop a deformable-DETR (DDETR) backbone with a ResNet-50 + FPN encoder, and a multi-layer (L=6) deformable transformer decoder with learnable queries. The core innovation is Decoupled Objectness Learning (DOL), which segregates the “objectness” boundary (category-agnostic) from semantic class discrimination:
- The first decoder layer predicts a category-agnostic objectness score .
- The subsequent layers ($2…L$) focus exclusively on class-specific logits and bounding-box regression .
- The composite detection probability per query combines objectness and class specificity:
- To address annotation sparsity for unknowns, the Auxiliary Supervision Framework (ASF) introduces pseudo-labels from the Segment Anything Model (SAM), employing geometric cost functions and soft weights to mitigate label noise from backgrounds or fragments.
Wireless Sensor/Emergent Emitter Detection
USD for wireless clandestine sensor detection in (Singh et al., 2020) implements a multi-stage pipeline:
- Passive Discovery: Wi-Fi channel hopping to enumerate all active transmitters, logging MAC addresses and traffic statistics.
- Causality Detection: Establishing correlation between user motion (from an IMU) and device network activity through statistical tests (e.g., Granger causality), to infer which devices are “sensitive” to user behavior.
- Classification: Traffic-derived features and MAC OUI enable rule-based or lightweight ML assignment of device modality (e.g., camera, motion sensor).
- Localization: Region-of-interest (ROI) is iteratively narrowed using directed trials, IMU traces, and causality confirmation.
Wireless zero-shot emitter detection (Krasnov et al., 10 Nov 2025) frames USD as learning an embedding function (e.g., via CNN or Kolmogorov–Arnold Networks (KANs)), followed by a distance-based decision engine that clusters training samples and declares test instances as “unknown” if their feature-embedding outlier score exceeds a tunable threshold. It innovates with 2D-Constellation representations (message-invariant histograms over I/Q samples) and SVD-based initialization to stabilize feature learning.
3. Data Modalities, Learning Protocols, and Feature Representations
USDs rely critically on domain-specific data representations and self-supervised or weakly-supervised learning schemes:
- Object Detection (Vision): Raw images are encoded and passed through deep transformer architectures. Pseudo-labels for unknowns are generated using LVMs such as SAM, and noise is mitigated via soft weighting calibrated by Mahalanobis-distance in feature space (He et al., 2023).
- Wireless Emitters: Two principal RF input modalities are explored (Krasnov et al., 10 Nov 2025):
- Time-series of raw I/Q samples, fed directly to a 1D-CNN.
- 2D-Constellation histograms, histogramming normalized I/Q points to eliminate spurious variation and emphasize device-specific fingerprints.
- Auxiliary Channels: For clandestine sensor detection, tri-axial IMU data from user smartphones enables causality-based detection protocols (Singh et al., 2020).
Learning can proceed via:
- Deep clustering (pseudo-label assignment with K-means over learned representations).
- Autoencoding (with reconstruction loss over input signals or images).
- Contrastive learning (SimCLR), maximizing agreement between augmentations in embedding space.
- Linear SVD initialization boosts learning with sparse constellation inputs.
KANs are leveraged to render the embedding process amenable to symbolic or interpretable diagnostics.
4. Quantitative Benchmarks and Evaluation Metrics
Metrics for assessing USD performance are context-dependent:
- Object Detection: Unknown Recall (U-Recall) measures the fraction of correctly flagged unknown entities; mean Average Precision ([email protected]) evaluates known-class detection (He et al., 2023).
- Wireless Detection: ROC-AUC for outlier/novelty detection, Normalized Mutual Information (NMI) for cluster alignment with ground truth emitters, and F1-score for binary unknown/known calls at a fixed threshold (Krasnov et al., 10 Nov 2025).
- Physical Sensor Localization: Detection accuracy (95.2%), classification rate (100%), false positives (≤5%), ROI area reduction (to ~8–12%), and euclidean localization error (1–2 m) offer comprehensive operational metrics (Singh et al., 2020).
Empirical evaluations demonstrate SOTA improvements, such as increases in U-Recall by 14–34 points over prior art in object detection, up to 40 percentage point ROC-AUC/NMI/F1 gains in zero-shot RF USDs compared to non-SVD baselines, and robust classification/localization for wireless sensors.
5. System-Level Pipelines and Implementation Details
Open-World Object Detection
Training utilizes Adam optimization, batch sizes and learning rate schedules, and incremental addition of exemplars for known classes. Pseudo-labels from SAM are distilled using grid prompts, IoU/stability filtering, and a two-part auxiliary loss comprising weighted objectness and bounding-box regression components. Inference combines geometric means of objectness and classification probabilities, with hyperparameters tuned by OWOD benchmark split (e.g., γ=0.6 vs. 0.7 for different datasets) (He et al., 2023).
Wireless Sensing
Data acquisition combines Wi-Fi sniffer logs and IMU time-series, with traces synchronized at 10 Hz to enable causality inference. Clandestine devices are localized using spatial subdivision by half-space elimination, requiring only 5–7 user trials to refine the search area to ≲10% of the starting ROI. The pipeline operates semi-passively, with active trials optional for reduced false positives (Singh et al., 2020).
Zero-Shot Emitter Detection
Model setup allows for alternative backbones (CNN vs. KAN). SVD-augmented initialization is preferred for sparse input spaces, and deep clustering is typically advantageous for 2D-Constellation representations. End-to-end unsupervised pipelines cycle between clustering and representation steps during training, and deploy a simple distance-based thresholding scheme for unknown detection at inference (Krasnov et al., 10 Nov 2025).
6. Design Principles, Practical Insights, and Limitations
USD design requires explicit decoupling of generic objectness from semantic discrimination both in visual and RF domains to prevent degradation from semantic manifold conflicts. Auxiliary supervision from powerful LVMs (vision) or causality/model-based inference (wireless) enhances coverage of previously unseen or surreptitious stimuli. Linear SVD initialization stabilizes learning over sparse representations, and interpretable KANs provide insight into what features or subspaces drive unknown detection.
Notable insights include:
- DOL mitigates the adverse interaction between general objectness and class boundary learning.
- SAM-based pseudo-labels dramatically expand unknown object coverage, with attention to fragment and background filter efficacy.
- Time-invariant constellation maps focus detector attention on device-intrinsic fingerprints even with variable message content.
Current metrics emphasize recall, with high rates of unknown-flagged false positives underscoring the need for future work on unknown detection precision. Residual noise in large-model pseudo-labels and scalability to fine-grained or temporally-evolving unknowns remain open challenges (He et al., 2023, Singh et al., 2020, Krasnov et al., 10 Nov 2025).
7. Comparative Summary
| Domain | Modality | Core Methodological Advance | SOTA Metrics |
|---|---|---|---|
| Computer Vision (OWOD) | Image | Decoupled Objectness Learning + ASF w/ SAM | +16–34 pt U-Recall vs. prior |
| Wireless Sensor Detection | Wi-Fi + IMU | Causality Detection + Trial Localization | 95.2% Detection, 1–2 m Localization |
| Zero-shot Emitter Detection | Raw I/Q, 2D-Constellation | SVD+KAN, Deep Clustering, CL | +40 pp ROC-AUC/NMI/F1 vs. baseline |
Editor's term: “USD” serves as a unifying label for systems that extend detection, classification, and localization robustly out-of-distribution, with demonstrated utility across modalities and application spaces.