Semi-Supervised Anomaly Detection Overview

Updated 24 November 2025

Semi-supervised anomaly detection is a method leveraging limited labeled normal or anomalous data combined with abundant unlabeled samples to identify rare outliers.
Techniques include reconstruction-based autoencoders, generative adversarial models, and PU risk-based learning to handle issues like contamination and class imbalance.
Empirical studies show these methods improve AUC/F1 scores across modalities, demonstrating robustness to distribution mismatches and scarce anomaly labels.

Semi-supervised anomaly detection addresses identification of outliers in scenarios where only limited labeled data (typically normal samples, sometimes also labeled anomalies) are available, with the remainder of the dataset unlabeled or weakly labeled. Such settings are pervasive in technical domains including industrial inspection, cybersecurity, graph analysis, and scientific discovery, reflecting the reality that anomalous events are rare and expensive to label, while normal data is comparatively abundant. The semi-supervised regime imposes stringent constraints on the learning algorithms, demanding robustness to label scarcity, contamination risk in the unlabeled pool, class imbalance, and potential distribution mismatch between labeled and unlabeled subsets.

1. Semi-Supervised Anomaly Detection: Core Notions and Problem Formulation

The canonical setup for semi-supervised anomaly detection involves two distinct datasets: a small labeled set (which may contain only normal or both normal and anomalous labels), and a large unlabeled pool, which may contain both types but with unknown class balance or distribution. Let $X_L = \{(x_i, y_i)\}_{i=1}^{n_L}$ ( $y_i \in \{0,1\}$ , typically $0$=normal, $1$=anomaly) and $X_U = \{x_j\}_{j=1}^{n_U}$ . In practice, $n_U \gg n_L$ , and labeled anomalies may be absent, present at low proportion, or differ in type from test-time anomalies.

The objective is to learn a scoring or classification function $f : \mathcal{X} \rightarrow \mathbb{R}$ that can separate normal samples from anomalous ones with high accuracy on both $X_L$ and out-of-sample data drawn from the underlying mixture distribution, even when anomaly types present in $X_U$ or at test time differ from those in $X_L$ .

Crucially, semi-supervised approaches must handle:

Contamination: unlabeled data may not be purely normal, and assuming so degrades performance (Takahashi et al., 2024).
Distribution Mismatch: labeled and unlabeled data may originate from different distributions (Yoon et al., 2022).
Limited Labeled Anomaly Coverage: “unknown” anomalies at test time may not be represented during training (Lau et al., 16 Jun 2025).
Class Imbalance: anomalies are rare, requiring bias correction and robust representation techniques.

2. Algorithmic Frameworks and Methodologies

Semi-supervised anomaly detection spans a spectrum of methodologies, often grounded in reconstructive, discriminative, generative, or risk-minimization principles, and increasingly adapted to different data modalities (tabular, image, graph, dynamical sequences).

2.1 Reconstruction-Based Approaches

A core class of techniques utilizes reconstructive autoencoders, often convolutional, trained only on normal data (one-class), leveraging the assumption that such models generalize poorly to anomalous inputs, which are then revealed by large reconstruction errors or residuals (Minhas et al., 2020). A typical example employs a deep U-Net–style autoencoder:

Training: The network is trained using only normal (defect-free) samples with a pixelwise MSE loss, enforcing faithful representation of the normal manifold.
Detection: For each test sample, reconstruction error $R(x) = |x - AE(x)|$ is computed. Residuals are thresholded (using normal-residual statistics or validation-based F1 maximization) to produce defect (anomaly) masks.
Performance: Achieves high F1-scores even in domains with challenging defect structures, outperforming several earlier GAN-based approaches (Minhas et al., 2020).

2.2 Generative Adversarial Techniques

Adversarial methods, such as GANomaly (Akcay et al., 2018) and PANDA (Barker et al., 2021), structure the model as encoder–decoder–encoder plus adversarial critic. These approaches:

Train only on normal data—anomalous data is not needed.
Couple latent and image reconstructions: The anomaly score is typically the discrepancy between the input's encoding and the encoding of its reconstruction.
Employ fine-grained discriminators, perceptual loss functions, or dual-latent architectures, to capture subtle anomalies.
State-of-the-art benchmarks: These models report AUC/AUPRC results consistently superior to prior deep anomaly detection and GAN counterparts, especially on image data.

2.3 Positive-Unlabeled (PU) and Risk-Based Semi-Supervised Methods

When labeled anomalies are available but unlabeled data can be contaminated, risk-based learning is critical. The positive-unlabeled anomaly detection paradigm (Takahashi et al., 2024, Hien et al., 2023) formulates the objective as a minimax surrogate risk combining labeled anomalies and unlabeled data, with unbiased or nonnegative PU risk estimators:

Typical risk function: For a deep model $f_\theta$ , the loss aggregates (i) a positive risk on labeled anomalies, (ii) a negative risk on unlabeled data (assumed mostly normal but possibly contaminated) with correction for overlap, and (iii) regularization.
Correction for contamination: Nonnegative risk estimators ensure that contaminated unlabeled risk does not introduce negative bias, preserving detection of rare anomalies.
Estimation guarantees: Explicit finite-sample error and excess risk bounds are provided (Hien et al., 2023).

2.4 Synthetic and Pseudo-Anomaly Generation

A major development is the empirical and theoretical justification for generating synthetic or pseudo-anomalies to compensate for insufficient anomaly coverage (Lau et al., 16 Jun 2025, Mayaki et al., 2023). This paradigm includes:

Random uniform or noise-generated synthetic anomalies in tabular, vision, or textual domains, which are combined with real labeled anomalies in the training objective.
Regularization via synthetic outliers produces a regression function that is continuous and can be optimized efficiently by neural networks, guaranteeing minimax-optimal convergence rates—this is mathematically proved for the first time in the semi-supervised AD setting (Lau et al., 16 Jun 2025).
Pseudo-anomaly graph generation, where node features/embeddings are synthesized by perturbing local neighborhoods or latent spaces, as in GGAD (Qiao et al., 2024).

2.5 Pseudo-Labeling, Ensemble, and Self-Training Schemes

Given small labeled sets, pseudo-label generation and self-training are frequent tools:

Ensemble one-class classifiers assign pseudo-labels to unlabeled data using consensus and distribution matching ("partial matching") (Yoon et al., 2022).
Two-stage semi-supervised learning with double verification, utilizing both local predictions and global clustering with interpretability constraints, greatly boosts reliability in network anomaly detection (Yuan et al., 2024).
Contrastive learning and prototype-based clustering: Representation learning can be further regularized by associating normal samples with cluster prototypes and leveraging self-supervised or contrastive objectives, which is especially effective on graphs and when temporal dynamics are present (Han et al., 2021, Tian et al., 2023).

2.6 Domain and Task Variations

Graphs: Semi-supervised anomaly detection on attributed graphs combines GCN-based embeddings, hypersphere or AUC objectives, and local/global structural regularization (Kumagai et al., 2020, Qiao et al., 2024).
Time-series/dynamics: Dynamic graph and continual anomaly detection methods employ evolving encoders, time-aware reference statistics, and memory banks to model temporal drift in normalcy while exploiting limited annotations (Tian et al., 2023, Belham et al., 2024).
Distance metric modifications: Directional and monotonic anomaly detection replaces standard absolute deviations by ramp or signed distances when risk factors are known to be one-sided, improving real-world detection performance (Lenz et al., 2024).

3. Theoretical Underpinnings and Guarantees

3.1 PAC-like Detection Guarantees

Recent progress includes wrappers that can provide provable bounds on false positive/negative rates in the semi-supervised regime, agnostic to the underlying detector (Li et al., 2022). By calibrating thresholds on small labeled sets and abstaining when the prediction is ambiguous, PAC-Wrap guarantees (with user-specified confidence) that error rates do not exceed preset tolerances, applicable across unsupervised and semi-supervised base models.

3.2 Information-Theoretic and Risk Bounds

Frameworks based on variational mutual information and entropy offer end-to-end objective functions, unifying classical two-stage approaches in a mathematically principled fashion. Encoder–decoder–encoder models optimize mutual information (reconstruction accuracy) and embedding entropy (compactness for normals, dispersion for anomalies), resolving the contradiction inherent in prior two-stage schemes (Huang et al., 2020).

PU and risk-based approaches come with explicit estimation and excess risk bounds for both shallow and deep models, parameterized by Rademacher complexity and loss properties (Hien et al., 2023).

3.3 Minimax-Optimality via Synthetic Anomalies

Including uniform or random synthetic anomalies renders the regression function smooth, making it amenable to neural network approximation, and guarantees minimax-optimal convergence in semi-supervised AD—something unattainable using only real anomaly distributions due to discontinuities (Lau et al., 16 Jun 2025).

4. Empirical Performance and Practical Trade-offs

Broad empirical validation has established the superiority of contemporary semi-supervised anomaly detection methods across modalities:

Deep semi-supervised detectors (e.g., RoSAS, ESAD, Elsa+) consistently outperform both unsupervised and naïve semi-supervised approaches by 5–30 AUC points across image, tabular, and sequence domains, and sustain robustness in the face of increasing unlabeled contamination (Xu et al., 2023, Huang et al., 2020, Han et al., 2021).
Synthetic anomaly and pseudo-labeling frameworks deliver general improvements, especially in cases where labeled anomalies are scarce, missing, or misaligned with test-time distributions (Lau et al., 16 Jun 2025, Mayaki et al., 2023).
Graph and network methods such as GGAD and SAD exploit few-shot labeled nodes to great effect, boosting AUROC/AUPRC by >20% over unsupervised graph detectors (Qiao et al., 2024, Tian et al., 2023).
Interpretability and reliability: Frameworks with integrated local/global explanation and pseudo-label consistency (e.g., AnomalyAID/SADDE) achieve superior detection with robust, technician-facing explanations (Yuan et al., 2024).

Representative Empirical Results

Domain	Method	Summary of Performance	Reference
Vision (images)	U-Net AE	F1=0.885 (DAGMC8/RSDDsI)	(Minhas et al., 2020)
Vision (GAN)	PANDA	AUC=0.91 (CIFAR10 AUPRC), 0.90 (MNIST AUPRC), 0.834 (MVTEC)	(Barker et al., 2021)
Tabular	RoSAS	Best AUC-PR 0.858; +20–30% over SOTA; robust to 8% contamination	(Xu et al., 2023)
Graph	GGAD	+21% AUROC, +39% AUPRC over best unsupervised with 15% labels	(Qiao et al., 2024)
Dynamic graph	SAD	Outperforms SOTA on all benchmarks at few-shot label rates	(Tian et al., 2023)
Universal	PAC-Wrap	Guarantees FPR/FNR within user-set tolerances	(Li et al., 2022)
Continual	CSAD (VAE+EVT)	AUC of 0.690 (MNIST), up to 6 points above EWC baseline	(Belham et al., 2024)

5. Limitations and Practical Considerations

Despite progress, semi-supervised anomaly detection faces concise limitations:

Quality of synthetic anomalies: Poorly designed perturbed samples may not cover the true anomaly support, especially for highly structured or non-additive anomalies (Mayaki et al., 2023).
Sensitivity to contamination and prior estimation: Risk-based and PU approaches require estimation of anomaly contamination rates in the unlabeled set, potentially challenging in extreme imbalance scenarios (Takahashi et al., 2024, Hien et al., 2023).
Interpretability and abstention: PAC-wrapped detectors may abstain on ambiguous samples, reducing recall in some applications (Li et al., 2022).
Computational cost: Although many methods scale well, some (e.g., full EM on large GMMs, prototype clustering) may incur high overheads on large or high-dimensional datasets (Kuusela et al., 2011, Han et al., 2021).
Continual and open-set domains: Extensions to continual learning or domains with ongoing distribution drift require explicit mechanisms for generative replay, outlier rejection in latent space via EVT, and robust online adaptation (Belham et al., 2024).
Graph domain bias: Label propagation and graph-based methods may be sensitive to graph structure and class-imbalance handling, and require careful regularization or AUC-term balancing (Kumagai et al., 2020, Qiao et al., 2024).

6. Directions for Future Research

Emergent research lines include:

Domain adaptation and distribution mismatch: Formalizing and correcting for labeled–unlabeled domain shifts remains critical, with recent pseudo-labeler/ensemble frameworks providing promising blueprints (Yoon et al., 2022).
Advanced continual learning: CSAD benchmarks are now formalized, but advanced replay, active-labeling, and meta-adaptation of hyperparameters are open avenues (Belham et al., 2024).
Structured and directional metrics: Integration of domain-informed distance metrics (e.g., ramp or monotonic distances) into neural and neighbor-based detectors demonstrates improved interpretability and statistical power (Lenz et al., 2024).
Interpretability and reliability: Multi-modal and multi-granular explanation frameworks are still rare, and reliability under adversarial contamination is not fully explored (Yuan et al., 2024).
Unified theory: Theoretical grounding for minimax-optimal rates, objective smoothness, and robustness to anomaly support misspecification is now available but further generalization to more complex data types (multi-modal, sequential, graph-heterogeneous) is an open challenge (Lau et al., 16 Jun 2025).

7. Synthesis and Impact

Semi-supervised anomaly detection is now characterized by mature, mathematically grounded objectives enveloping reconstruction, generative, risk-minimization, and pseudo-labeling paradigms. The empirical benchmarks clearly show that modest supervision—particularly when coupled with contamination-resilient learning, synthesized pseudo-anomalies, and robust risk estimation—closes much of the performance gap with supervised methods, while providing robustness, distributional flexibility, and strong applicability to graph, sequence, and high-dimensional data. The field increasingly emphasizes both theoretical guarantees and practical trade-offs, informing deployment decisions in safety-critical, industrial, and scientific contexts (Lau et al., 16 Jun 2025, Xu et al., 2023, Minhas et al., 2020).