Exploiting Autoencoder's Weakness to Generate Pseudo Anomalies (2405.05886v2)

Published 9 May 2024 in cs.LG and cs.CV

Abstract: Due to the rare occurrence of anomalous events, a typical approach to anomaly detection is to train an autoencoder (AE) with normal data only so that it learns the patterns or representations of the normal training data. At test time, the trained AE is expected to well reconstruct normal but to poorly reconstruct anomalous data. However, contrary to the expectation, anomalous data is often well reconstructed as well. In order to further separate the reconstruction quality between normal and anomalous data, we propose creating pseudo anomalies from learned adaptive noise by exploiting the aforementioned weakness of AE, i.e., reconstructing anomalies too well. The generated noise is added to the normal data to create pseudo anomalies. Extensive experiments on Ped2, Avenue, ShanghaiTech, CIFAR-10, and KDDCUP datasets demonstrate the effectiveness and generic applicability of our approach in improving the discriminative capability of AEs for anomaly detection.

Authors (4)

Marcella Astrid (22 papers)
Muhammad Zaigham Zaheer (22 papers)
Djamila Aouada (57 papers)
Seung-Ik Lee (16 papers)

Summary

Overview

The paper "Exploiting Autoencoder's Weakness to Generate Pseudo Anomalies" (Astrid et al., 9 May 2024 ) presents a novel methodology to enhance the discriminative power of autoencoders (AEs) for anomaly detection tasks. The central premise is that conventional AEs, when trained exclusively on normal data, tend to reconstruct anomalous inputs with unexpectedly high fidelity, leading to suboptimal anomaly separation. By leveraging this reconstruction behavior, the authors propose the generation of pseudo anomalies through adaptive noise, which is subsequently used to regularize the system during training. This approach is designed to widen the reconstruction error gap between normal and anomalous instances.

Methodology

Autoencoder and Adaptive Noise Generator

The proposed framework consists of two intertwined components:

Reconstruction Autoencoder (F):
- Trained solely on normal data, F is optimized to minimize reconstruction error for normal samples.
- At test time, reconstruction errors serve as the primary anomaly score.
Noise Generator (G):
- G is architecturally similar to an autoencoder but is tasked with generating adaptive noise based on normal inputs.
- The generated noise is carefully designed such that, when superimposed on normal instances, it creates pseudo anomalies that are within a specific reconstruction boundary but are still challenging for F to reconstruct.
- This adversarial-like interaction is formalized by leveraging a dual optimization objective where G seeks to maximize the noise's disruptive effect while being bounded by a hyperparameter-controlled reconstruction quality.

Joint Training Strategy

The training process involves alternating updates for both F and G. Specifically:

Autoencoder Update:

F is updated to minimize the reconstruction error for normal samples and to resist the pseudo anomalies from G. This involves the standard reconstruction loss, often defined as:

$L_F = \| x - F(x) \|^2$

where $x$ represents a normal input.

Noise Generator Update:

G is updated to generate noise $\delta$ such that the pseudo anomaly $x' = x + \delta$ falls within a pre-defined reconstruction boundary of F. This is enforced via a loss function that incorporates a weighting factor $\lambda$ and a pseudo anomaly probability $p$ controlling the contribution of pseudo anomalies. The objective for G can be written as:

$L_G = \lambda \cdot \| \delta \|^2 - p \cdot \| F(x+\delta) - x \|^2$

This formulation encourages the generation of perturbations that maximize the reconstruction error gap while not deviating excessively from the learned normal manifold.

Hyperparameter Sensitivity

Two hyperparameters are critical in this setting:

Pseudo Anomaly Probability, $p$ :

Determines the frequency with which pseudo anomalies are used during training.

Weighting Factor, $\lambda$ :

Controls the magnitude of the noise perturbations relative to the allowable reconstruction error surrogate.

The paper includes a comprehensive ablation paper over these hyperparameters, demonstrating that appropriate tuning can yield strong improvements in anomaly detection performance across diverse domains.

Experimental Results

The authors evaluate their approach on five benchmark datasets spanning video (Ped2, Avenue, ShanghaiTech), image (CIFAR-10), and network intrusion (KDDCUP) domains. Key outcomes include:

On the Ped2 and Avenue datasets, the proposed method achieves a significant improvement in frame-level Area Under the Curve (AUC) relative to both the baseline AE and variants using fixed Gaussian noise.
In CIFAR-10 experiments, the method demonstrates superior classification accuracy for anomaly detection compared to standard models.
For KDDCUP, the strategy yields notable gains in detection precision and recall, reflecting its robustness across heterogeneous data modalities.

These results indicate that the adaptive noise generation mechanism successfully enforces a wider gap in reconstruction errors between normal and anomalous data, thereby making the AE's learned representations more sensitive to subtle deviations.

Analysis and Practical Implications

Theoretical Insights

By exploiting the over-generalization tendency of autoencoders, the proposed methodology reframes what is traditionally considered a weakness into an advantage. The design of the noise generator introduces an implicit adversarial mechanism that refines the reconstruction boundary. Consequently, the reconstruction error distribution becomes more bimodal, a desirable property that enhances threshold-based anomaly detection.

Implementation Considerations

Computational Complexity:

The joint training of F and G necessitates additional computational overhead compared to a standard AE. However, since G is discarded at inference time, the runtime complexity of the deployed model remains equivalent to that of a conventional autoencoder.

Hyperparameter Tuning:

Careful calibration of $p$ and $\lambda$ is essential to balance the system. Suboptimal values might either under-constrain the pseudo anomaly generation (leading to negligible impact) or over-penalize the AE during reconstruction (resulting in excessive false positives).

Scalability:

Given that the training procedure requires alternating updates, distributed or parallel training strategies can be employed to efficiently handle large-scale datasets.

Deployment Strategies

For real-world scenarios, practitioners can integrate this framework into existing anomaly detection pipelines by:

Training Augmentation:

Incorporating pseudo anomaly generation during the AE training phase without needing to alter the inference architecture.

Model Calibration:

Implementing rigorous cross-validation to select appropriate hyperparameters tailored to the specific characteristics of the dataset.

Domain Adaptation:

Given its generic formulation, the method can be seamlessly adapted to different modalities by minor architectural modifications and dataset-specific pre-processing.

Conclusion

The paper offers a technically robust framework for enhancing the discriminative capabilities of autoencoders used for anomaly detection. By systematically generating pseudo anomalies via adaptive noise and incorporating an alternating training procedure, the method addresses the prevalent issue of high-fidelity anomaly reconstruction inherent in traditional AEs. With extensive empirical validation across varied and challenging domains, the approach is demonstrated to deliver quantitatively strong improvements. This work serves as a practical guide for researchers and practitioners interested in improving anomaly detection systems, particularly in environments where the assumptions underpinning conventional methods do not hold.

PDF Markdown

Related Papers

Find Related Papers