SPADE: Semi-supervised Anomaly Detection
- SPADE is a semi-supervised anomaly detection framework that addresses mismatches between labeled and unlabeled data distributions using ensemble one-class classifiers, self-supervised learning, and adaptive thresholding.
- It employs partial matching with Wasserstein distance to autonomously set score thresholds, ensuring robust pseudo-label assignment amid diverse anomaly types.
- Empirical results demonstrate substantial AUC improvements on tabular, image, and fraud detection tasks, validating its practical effectiveness despite distribution shifts.
SPADE refers to the Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling framework for semi-supervised anomaly detection under distribution mismatch (Yoon et al., 2022). This method addresses the challenge where labeled and unlabeled samples originate from different distributions—a frequent violation in practical anomaly detection scenarios. SPADE fuses an ensemble of one-class classifiers with robust, automatic hyperparameter selection, leveraging Wasserstein distance-based partial matching, and integrates self-supervised representation learning for enhanced performance.
1. Problem Formulation and Distribution Mismatch
SPADE is formulated for the generic semi-supervised anomaly detection problem. The dataset consists of a labeled subset and an unlabeled subset , drawn from potentially distinct feature distributions and (i.e., ). Labels are , with denoting anomaly and anomalous examples being rare: . The goal is to learn a classifier to minimize:
without assuming matched distributions between labeled/unlabeled data.
SPADE's primary contribution is its explicit handling of this distribution mismatch, enabling robust learning when, for example, (a) labeled data contains only a subset of anomaly types, (b) labeled data contains only “easy-to-label” normals or anomalies, or (c) unlabeled data possesses non-overlapping anomaly types.
2. Ensemble Pseudo-labeling via One-Class Classifiers
To provide reliable pseudo-labels for unlabeled data amidst distribution mismatch, SPADE constructs an ensemble of one-class classifiers (OCCs). Each OCC is fit on the union of labeled normal samples and a unique partition from unlabeled data:
The OCCs output anomaly scores , where is the learnable feature encoder. Pseudo-labels are determined by unanimous voting and score thresholding:
- Each uses lower and upper thresholds .
- For each :
- if else $0$
- if else $0$
- The final pseudo-label is:
- $0$ (normal) if
- $1$ (anomalous) if
- (uncertain) otherwise
This pseudo-labeling is robust to distribution shift, as only consensus assignments are accepted.
3. Partial Matching: Hyperparameter-Free Thresholding
Hyperparameter selection, particularly score thresholds , is a crucial challenge under mismatch due to lack of validation data. SPADE introduces a "partial matching" principle, tuning thresholds so that the empirical score distributions of (i) positive-labeled samples and high-scoring unlabeled samples, and (ii) negative-labeled samples and low-scoring unlabeled samples, are as close as possible in Wasserstein (earth-mover's) distance:
where is the 1-D Wasserstein distance.
In pure positive-unlabeled (PU) settings, where only one class is labeled, SPADE defaults to Otsu-style thresholding.
4. Overall Algorithmic Workflow
The SPADE training routine alternates ensemble pseudo-labeler construction and parameter updates, integrating self-supervised representation learning for the encoder. The primary steps are:
- Build pseudo-labeler by partitioning , training OCCs on , and setting thresholds via partial matching.
- Collect pseudo-labeled points with unanimous OCC consensus.
- Minimize a total loss across labeled and pseudo-labeled samples, combining:
- Supervised BCE loss over
- BCE loss over pseudo-labeled
- Self-supervised objective (e.g., reconstruction/contrastive) over both and
- Repeat until convergence; at inference, the output anomaly score is the sigmoid .
For tabular data, is a shallow MLP and is a Gaussian-mixture density estimator. For images, is a ResNet-18, is a CutPaste-style projection head, and is a GDE over representations.
5. Computational Aspects and Complexity
Typical runtime per experiment (single NVIDIA V100):
- Tabular benchmarks: hour per run
- Image benchmarks: hours per run
The computational burden is dominated by self-supervised encoder updates; OCC ensemble re-training is lightweight (only once per epoch, using shallow estimators).
6. Empirical Performance and Quantitative Benchmarks
SPADE demonstrates state-of-the-art anomaly detection performance under distribution mismatch across tabular, image, and real-world fraud datasets. Key results include:
- Tabular, “new anomalies” scenario: +10.6% increase in overall AUC (Thyroid: 0.921 vs. 0.815 supervised)
- Image, “new anomalies”: SPADE achieves 87.9% AUC vs. 81.4% (FixMatch) on MVTec, 85.2% vs. 69.1% on Magnetic datasets
- Fraud detection: On time-shifted Kaggle credit card data (5% labels), 98.2% AUC (SPADE) vs. 94.1% (VIME), vs. 97.5% (supervised); Xente: 92.0% vs. 85.9%
- Pure PU tabular: +15% AUC over BaggingPU and Elkanoto on missed anomaly types
In all cases, SPADE’s improvements stem from robust pseudo-labeling, distribution-aware thresholding, and effective self-supervised representation learning.
7. Limitations, Assumptions, and Future Directions
SPADE's performance depends on the quality of the OCC base models; unreliable density estimates on can degrade pseudo-labeler precision, though the framework reports pseudo-positive label precision 80% at high anomaly score percentiles. The method treats labeled and unlabeled sets symmetrically at inference, so further distributional shifts at test time may warrant explicit adaptation (e.g., via domain-adversarial losses or moment alignment).
SPADE does not explicitly model fine-grained, continuous covariate drift between domains. Potential extensions include: end-to-end neural OCCs supplanting GDEs, adaptive ensemble sizes, learnable thresholding networks, or multi-class anomaly detection under mismatch.
8. Significance and Summary
SPADE provides a canonical, practically robust methodology for semi-supervised anomaly detection under realistic distribution mismatches, unifying an ensemble OCC-based pseudo-labeler, self-supervised feature learning, and unsupervised hyperparameter selection by partial distribution matching. It delivers consistent, notable improvements in AUC for both tabular and vision domains, particularly in settings with non-overlapping anomalies and major label bias, and eliminates reliance on assumption-matched validation splits (Yoon et al., 2022).