GANomaly: Semi-Supervised Anomaly Detection

Updated 24 November 2025

GANomaly is a semi-supervised anomaly detection framework that uses an encoder–decoder–encoder architecture to capture the normal data distribution.
It combines reconstruction, latent consistency, and adversarial losses to enforce both pixel-level and latent-level fidelity in generated samples.
Empirical results show GANomaly outperforms traditional methods in image and medical signal domains while ensuring efficient inference.

The GANomaly model is a semi-supervised anomaly detection framework that integrates autoencoder and adversarial training paradigms to effectively characterize the distribution of normal data and to facilitate the identification of anomalous instances by dual reconstruction and distribution-matching losses. Originally introduced for applications such as image-based anomaly detection, GANomaly has demonstrated superior efficacy over previous state-of-the-art approaches across several benchmark domains, by leveraging complementary encoding, reconstruction, and adversarial objectives to drive both pixel-level and latent-level fidelity in generated samples (Akcay et al., 2018). Recent adaptations and quantitative evaluations in medical signal analysis (e.g., cardiotocography) further confirm its robustness and adaptability (Bertieaux et al., 2022).

1. Architectural Overview

At its core, GANomaly comprises three principal modules organized as follows:

Generator $G$ $G$ , itself composed of a first encoder $\mathrm{enc}_1$ $enc_{1}$ , a decoder $\mathrm{dec}$ $dec$ , and a second encoder $\mathrm{enc}_2$ $enc_{2}$ . This structure—termed "encoder–decoder–encoder"—facilitates both data and latent-space reconstruction.
- $\mathrm{enc}_1$ maps the input $x$ to a low-dimensional latent code $z_1 = f_1(x)$ .
- $\mathrm{dec}$ reconstructs a data-space sample $\hat{x} = g(z_1)$ .
- $\mathrm{enc}_2$ encodes the reconstructed sample to a latent code $z_2 = f_2(\hat{x})$ .
Discriminator $D(x)$ , a binary classifier trained to distinguish real samples drawn from the training data distribution from generated (reconstructed) samples $\hat{x}$ .

The generator minimizes three losses: contextual (reconstruction) loss, encoding (latent consistency) loss, and adversarial loss, whereas the discriminator minimizes the standard adversarial cross-entropy, thus implementing a two-player minimax game (Bertieaux et al., 2022, Akcay et al., 2018).

2. Loss Formulation

GANomaly employs a compound objective comprising three terms:

Reconstruction Loss (Contextual):

$\mathcal{L}_{rec} = \mathbb{E}_{x}\left[\|x-g(f_1(x))\|_1\right]$

This encourages generated samples to match the input at the pixel or feature level.

Encoding Loss (Latent Consistency):

$\mathcal{L}_{enc} = \mathbb{E}_{x}\left[\|f_1(x) - f_2(g(f_1(x)))\|_1\right]$

Ensures that the latent code of the input and re-encoded output are aligned in representation space.

Adversarial Loss:

$\mathcal{L}_{adv} = \mathbb{E}_x[\log D(x)] + \mathbb{E}_x[\log(1 - D(g(f_1(x))))]$

This term aligns the distribution of reconstructed samples with the distribution of true data.

The generator is optimized to minimize:

$L_G = \lambda_c \| x - \hat{x} \|_1 + \lambda_e \| z_1 - z_2 \|_1 - \lambda_a \mathbb{E}_x[ \log D(\hat{x})]$

while the discriminator is optimized using:

$L_D = -\mathbb{E}_x[ \log D(x)] - \mathbb{E}_x [\log (1 - D(\hat{x}))]$

Weights $\lambda_c$ , $\lambda_e$ , and $\lambda_a$ determine the contributions of each term; standard values from the literature are $\lambda_c=50$ , $\lambda_e=1$ , $\lambda_a=1$ (Akcay et al., 2018, Bertieaux et al., 2022). Notably, recent modifications (e.g., in (Bertieaux et al., 2022)) return to the standard GAN adversarial cross-entropy, eschewing the feature-matching variant to reduce redundancy with the encoding loss.

3. Training and Inference Procedures

Training:

Exclusively normal-class samples are used for training.
Each batch passes through the generator to compute $z_1$ , $\hat{x}$ , and $z_2$ .
The discriminator is updated for $k_d=1$ step per iteration, the generator for $k_g=2$ steps, both via Adam with learning rate $2 \times 10^{-4}$ , $\beta_1=0.5$ .
Epochs typically run until convergence (1000–2000 epochs reported for CTG analysis (Bertieaux et al., 2022)).
Loss-weight hyperparameters obtained by grid search for optimal F1-score.

Inference:

For any test sample, compute reconstruction error $s_{ctx} = \|x - \hat{x}\|_1$ and latent discrepancy $s_{lat} = \|z_1 - z_2\|_1$ .
An anomaly score (either $s_{ctx}$ alone or a combination) is assigned; a threshold $\tau = \mu + 5\sigma$ (empirically obtained on held-out validation data) is applied for decision making (Bertieaux et al., 2022).
Anomalies are called if $s > \tau$ .

4. Hyperparameters and Architectures

Layer-wise and optimizer details, as applied in CTG abnormality detection (Bertieaux et al., 2022):

Module	Structure	Activation
Encoder ( $\mathrm{enc}_1$ )	Dense(128) → Dense(64) → Dense(16)	LeakyReLU( $\alpha$ =0.2)
Decoder ( $\mathrm{dec}$ )	Dense(16) → Dense(64) → Dense(128)	LeakyReLU( $\alpha$ =0.2), output: linear
Second Encoder ( $\mathrm{enc}_2$ )	Dense(128) → Dense(16)	LeakyReLU( $\alpha$ =0.2)
Discriminator	Dense(128) → LeakyReLU → Dense(16) → LeakyReLU → Dense(1) → Sigmoid

Additional settings:

Adam optimizer: lr= $2 \times 10^{-4}$ , $\beta_1=0.5$
Loss weights: $\lambda_c$ = 50 (contextual), $\lambda_e$ = 1 (encoding), $\lambda_a$ = 1 (adversarial)
Anomaly thresholding: $\tau = \mu + 5\sigma$ , with $\mu$ , $\sigma$ from normal validation set
Number of epochs: 1000–2000 typically required for convergence.

5. Empirical Performance and Comparative Evaluation

Quantitative results consistently demonstrate GANomaly's state-of-the-art performance across diverse anomaly detection settings.

On CTU-UHB CTG data (Bertieaux et al., 2022), modified GANomaly achieves:
- F1-score: $0.752 \pm 0.011$
- Balanced accuracy: $0.750 \pm 0.001$
- Precision: $0.682 \pm 0.041$
- Recall: $0.663 \pm 0.042$
Baseline comparisons on held-out data:
- Autoencoder, Isolation Forest, SVM, Random Forest, and CNN-BiLSTM+Attention all yield lower F1 and balanced accuracy, with GANomaly providing the highest ROC and precision–recall area.
Image benchmark results (Akcay et al., 2018):
- MNIST (mean over digit-one-vs-rest): AUC $\sim 0.87$ (vs. EGBAD $\sim 0.85$ , AnoGAN $\sim 0.83$ , VAE $\sim 0.80$ )
- UBA (patches): overall AUC $=0.643$ ; FFOB (full X-ray): AUC $=0.882$
- Inference speed per sample: $\sim2.5$ ms (substantially faster than iterative-inversion approaches)

A key observation is that, by enforcing both pixel-level and latent-level reconstruction strictness and incorporating adversarial distribution constraints, GANomaly distinguishes itself from both classical and deep autoencoder-based approaches (Akcay et al., 2018, Bertieaux et al., 2022).

GANomaly is situated among deep generative models for anomaly detection, including:

AnoGAN (two-stage, slow inference)
EGBAD (BiGAN-based)
VAE-based approaches (variational autoencoder)

Skip-GANomaly (Akçay et al., 2019) further extends GANomaly by introducing U-Net-style skip connections in the generator and employing the discriminator as a feature-space latent extractor. This results in increased reconstruction quality for normal samples, more salient anomaly signals, and elevated AUC across challenging datasets (e.g., UBA: from 0.643 to 0.94; FFOB: from 0.882 to 0.903). However, GANomaly retains its advantage as a conceptually simple, scalable, and computationally efficient approach, particularly when encoders, decoders, and adversarial objectives are precisely balanced and regularized by reconstruction and latent consistency losses.

7. Application Domains and Observed Limitations

GANomaly has been validated in discrete image domains (handwritten digits, object datasets, X-ray screening) and continuous signal domains (cardiotocography). Its exclusive use of normal-class data during training and its dual focus on data/latent reconstruction render it suited for unsupervised and semi-supervised anomaly detection where anomalous samples are rare or unavailable (Akcay et al., 2018, Bertieaux et al., 2022).

A plausible implication, drawn from comparative evaluations, is that GANomaly’s performance is maximized when the underlying data distribution can be effectively captured by its latent autoencoding structure and the presence of large scale or local anomalies substantially disrupts both pixel and latent reconstructions. Empirically, loss weight calibration (especially $\lambda_{con}$ ) and architecture choices (layer widths, activation functions) are critical to model expressivity and detection sensitivity.

GANomaly’s efficient inference, single-stage training, and quantifiable improvement over both classical unsupervised and supervised baselines position it as a canonical architecture in adversarially-trained anomaly detection research.