SPADE: Semi-supervised Anomaly Detection

Updated 9 February 2026

SPADE is a semi-supervised anomaly detection framework that addresses mismatches between labeled and unlabeled data distributions using ensemble one-class classifiers, self-supervised learning, and adaptive thresholding.
It employs partial matching with Wasserstein distance to autonomously set score thresholds, ensuring robust pseudo-label assignment amid diverse anomaly types.
Empirical results demonstrate substantial AUC improvements on tabular, image, and fraud detection tasks, validating its practical effectiveness despite distribution shifts.

SPADE refers to the Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling framework for semi-supervised anomaly detection under distribution mismatch (Yoon et al., 2022). This method addresses the challenge where labeled and unlabeled samples originate from different distributions—a frequent violation in practical anomaly detection scenarios. SPADE fuses an ensemble of one-class classifiers with robust, automatic hyperparameter selection, leveraging Wasserstein distance-based partial matching, and integrates self-supervised representation learning for enhanced performance.

1. Problem Formulation and Distribution Mismatch

SPADE is formulated for the generic semi-supervised anomaly detection problem. The dataset consists of a labeled subset $\mathcal{D}^l = \{(x_i^l, y_i^l)\}_{i=1}^{N_l}$ and an unlabeled subset $\mathcal{D}^u = \{x_j^u\}_{j=1}^{N_u}$ , drawn from potentially distinct feature distributions $\mathcal{P}_X^l$ and $\mathcal{P}_X^u$ (i.e., $\mathcal{P}_X^l \ne \mathcal{P}_X^u$ ). Labels are $y\in\{0,1\}$ , with $y=1$ denoting anomaly and anomalous examples being rare: $P(y=1)\ll P(y=0)$ . The goal is to learn a classifier $f:\mathcal{X}\to[0,1]$ to minimize:

$\mathbb{E}_{x,y\sim(\mathcal{P}_X^l \cup \mathcal{P}_X^u, f^*(x))}\left[\mathcal{L}(f(x), y)\right]$

without assuming matched distributions between labeled/unlabeled data.

SPADE's primary contribution is its explicit handling of this distribution mismatch, enabling robust learning when, for example, (a) labeled data contains only a subset of anomaly types, (b) labeled data contains only “easy-to-label” normals or anomalies, or (c) unlabeled data possesses non-overlapping anomaly types.

2. Ensemble Pseudo-labeling via One-Class Classifiers

To provide reliable pseudo-labels for unlabeled data amidst distribution mismatch, SPADE constructs an ensemble $\{o_k\}_{k=1}^K$ of one-class classifiers (OCCs). Each OCC $o_k$ is fit on the union of labeled normal samples $D_0^l$ and a unique partition $D^u_k$ from unlabeled data:

$D_0^l = \{x_i^l : y_i^l = 0\}$
$\mathcal{D}^u = \bigcup_{k=1}^K D_k^u$

The OCCs output anomaly scores $o_k(h(x))\in\mathbb{R}$ , where $h$ is the learnable feature encoder. Pseudo-labels are determined by unanimous voting and score thresholding:

Each $o_k$ uses lower and upper thresholds $(\eta_k^n, \eta_k^p)$ .
For each $x^u$ $x^{u}$ :
- $\hat y_k^{nu} = 1$ if $o_k(h(x^u)) < \eta_k^n$ else $0$
- $\hat y_k^{pu} = 1$ if $o_k(h(x^u)) > \eta_k^p$ else $0$
The final pseudo-label $v(h(x^u))$ $v (h (x^{u}))$ is:
- $0$ (normal) if $\prod_k \hat y_k^{nu} = 1$
- $1$ (anomalous) if $\prod_k \hat y_k^{pu} = 1$
- $-1$ (uncertain) otherwise

This pseudo-labeling is robust to distribution shift, as only consensus assignments are accepted.

3. Partial Matching: Hyperparameter-Free Thresholding

Hyperparameter selection, particularly score thresholds $(\eta_k^n, \eta_k^p)$ , is a crucial challenge under mismatch due to lack of validation data. SPADE introduces a "partial matching" principle, tuning thresholds so that the empirical score distributions of (i) positive-labeled samples and high-scoring unlabeled samples, and (ii) negative-labeled samples and low-scoring unlabeled samples, are as close as possible in Wasserstein (earth-mover's) distance:

$\eta_k^p = \arg\min_{\eta} W(\{o_k(h(x_i^l)) : y_i^l=1\}, \{o_k(h(x^u)) : o_k(h(x^u)) > \eta\})$

$\eta_k^n = \arg\min_{\eta} W(\{o_k(h(x_i^l)) : y_i^l=0\}, \{o_k(h(x^u)) : o_k(h(x^u)) < \eta\})$

where $W(\cdot, \cdot)$ is the 1-D Wasserstein distance.

In pure positive-unlabeled (PU) settings, where only one class is labeled, SPADE defaults to Otsu-style thresholding.

4. Overall Algorithmic Workflow

The SPADE training routine alternates ensemble pseudo-labeler construction and parameter updates, integrating self-supervised representation learning for the encoder. The primary steps are:

Build pseudo-labeler by partitioning $\mathcal{D}^u$ , training $K$ OCCs on $D_0^l\cup D^u_k$ , and setting thresholds via partial matching.
Collect pseudo-labeled points with unanimous OCC consensus.
Minimize a total loss across labeled and pseudo-labeled samples, combining:
- Supervised BCE loss over $\mathcal{D}^l$
- BCE loss over pseudo-labeled $\mathcal{D}^{u+}$
- Self-supervised objective $L_{\text{self}}$ (e.g., reconstruction/contrastive) over both $\mathcal{D}^l$ and $\mathcal{D}^u$
Repeat until convergence; at inference, the output anomaly score is the sigmoid $q_\phi(h_\theta(x))$ .

For tabular data, $h$ is a shallow MLP and $o_k$ is a Gaussian-mixture density estimator. For images, $h$ is a ResNet-18, $g$ is a CutPaste-style projection head, and $o_k$ is a GDE over representations.

5. Computational Aspects and Complexity

Typical runtime per experiment (single NVIDIA V100):

Tabular benchmarks: $\leq1$ hour per run
Image benchmarks: $\leq4$ hours per run

The computational burden is dominated by self-supervised encoder updates; OCC ensemble re-training is lightweight (only once per epoch, using shallow estimators).

6. Empirical Performance and Quantitative Benchmarks

SPADE demonstrates state-of-the-art anomaly detection performance under distribution mismatch across tabular, image, and real-world fraud datasets. Key results include:

Tabular, “new anomalies” scenario: +10.6% increase in overall AUC (Thyroid: 0.921 vs. 0.815 supervised)
Image, “new anomalies”: SPADE achieves 87.9% AUC vs. 81.4% (FixMatch) on MVTec, 85.2% vs. 69.1% on Magnetic datasets
Fraud detection: On time-shifted Kaggle credit card data (5% labels), 98.2% AUC (SPADE) vs. 94.1% (VIME), vs. 97.5% (supervised); Xente: 92.0% vs. 85.9%
Pure PU tabular: +15% AUC over BaggingPU and Elkanoto on missed anomaly types

In all cases, SPADE’s improvements stem from robust pseudo-labeling, distribution-aware thresholding, and effective self-supervised representation learning.

7. Limitations, Assumptions, and Future Directions

SPADE's performance depends on the quality of the OCC base models; unreliable density estimates on $h(x)$ can degrade pseudo-labeler precision, though the framework reports pseudo-positive label precision $\geq$ 80% at high anomaly score percentiles. The method treats labeled and unlabeled sets symmetrically at inference, so further distributional shifts at test time may warrant explicit adaptation (e.g., via domain-adversarial losses or moment alignment).

SPADE does not explicitly model fine-grained, continuous covariate drift between domains. Potential extensions include: end-to-end neural OCCs supplanting GDEs, adaptive ensemble sizes, learnable thresholding networks, or multi-class anomaly detection under mismatch.

8. Significance and Summary

SPADE provides a canonical, practically robust methodology for semi-supervised anomaly detection under realistic distribution mismatches, unifying an ensemble OCC-based pseudo-labeler, self-supervised feature learning, and unsupervised hyperparameter selection by partial distribution matching. It delivers consistent, notable improvements in AUC for both tabular and vision domains, particularly in settings with non-overlapping anomalies and major label bias, and eliminates reliance on assumption-matched validation splits (Yoon et al., 2022).

Markdown Report Issue Upgrade to Chat

References (1)

SPADE: Semi-supervised Anomaly Detection under Distribution Mismatch (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SPADE.