ADBench: Anomaly Detection Benchmark (2206.09426v2)

Published 19 Jun 2022 in cs.LG and cs.AI

Abstract: Given a long list of anomaly detection algorithms developed in the last few decades, how do they perform with regard to (i) varying levels of supervision, (ii) different types of anomalies, and (iii) noisy and corrupted data? In this work, we answer these key questions by conducting (to our best knowledge) the most comprehensive anomaly detection benchmark with 30 algorithms on 57 benchmark datasets, named ADBench. Our extensive experiments (98,436 in total) identify meaningful insights into the role of supervision and anomaly types, and unlock future directions for researchers in algorithm selection and design. With ADBench, researchers can easily conduct comprehensive and fair evaluations for newly proposed methods on the datasets (including our contributed ones from natural language and computer vision domains) against the existing baselines. To foster accessibility and reproducibility, we fully open-source ADBench and the corresponding results.

Citations (233)

View on Semantic Scholar

Summary

The paper introduces ADBench, a comprehensive framework that systematically evaluates 30 anomaly detection algorithms on 57 datasets.
It compares varying supervision levels, revealing that semi-supervised methods excel when labeled data is limited.
The study highlights that aligning algorithm assumptions with anomaly types and managing noise are key for robust detection.

ADBench: A Comprehensive Benchmark for Anomaly Detection Algorithms

The paper "ADBench: Anomaly Detection Benchmark" presents a detailed evaluation framework, termed ADBench, aimed at systematically assessing the performance of anomaly detection (AD) algorithms. Researchers have developed numerous AD techniques over the years, yet comparative benchmarks evaluating them under consistent conditions have been relatively sparse, especially regarding the nuances and complexities of real-world data. This benchmark fills this gap by evaluating 30 AD algorithms on 57 datasets and offers nuanced insights essential for both academic investigation and practical application of these technologies.

The benchmark explores three key dimensions: varying supervision levels, types of anomalies, and data quality and corruption impacts. These dimensions test the robustness and adaptability of algorithms in a manner reflecting the applied environments more accurately than prior benchmarks.

Key Dimensions and Findings

Supervision Levels:
- The benchmark compares unsupervised, semi-supervised, and supervised AD techniques. One striking insight is that no unsupervised method statistically outperforms others universally. This emphasizes the significance of context-specific algorithm selection. This insight aligns with the no-free-lunch theorem in optimization, which indicates the impracticality of a universally superior algorithm across all problem types.
- It is noteworthy that semi-supervised methods generally outperform fully-supervised ones when limited labeled data is available. Semi-supervised techniques are adept at leveraging minimal labeled information effectively, demonstrating their potential in anomaly-rich and label-scarce settings.
Anomaly Types:
- ADBench generates realistic synthetic data mimicking different anomaly types: local, global, dependency, and clustered anomalies. Matching algorithm assumptions with the type of anomaly apparent in the dataset is crucial. For instance, local outlier factor (LOF) excels at detecting local anomalies, reinforcing the importance of aligning an algorithm’s foundational assumption to the observed data characteristics.
Robustness to Noise:
- The evaluation shows unsupervised methods' susceptibility to duplicated anomalies, which dilutes their minority assumptions—a common pitfall in AD when duplicated data masks anomaly signals. Conversely, supervised methods proved more resilient against irrelevant features due to effective feature selection and the advantageous use of labels.
- Both semi- and fully-supervised methods exhibit resilience to minor annotation errors, though performance diminishes with increasing noise levels. This reinforces the value of high-quality label data in AD tasks.

Practical and Theoretical Implications

Algorithm Selection and Design: ADBench’s results accentuate the importance of algorithm selection based on the specific anomaly characteristics of interest and the environment's data quality. Researchers and practitioners can thus make informed choices about which AD technique suits their needs, reducing the inefficiencies in trial-and-error model selection.
Research Opportunities: The findings promote the development of new models designed with anomaly type awareness or increased resistance to noisy conditions. Transfer learning or domain adaptation frameworks could extend the utility of semi-supervised approaches to new domains lacking extensive labeled data.
Future Directions in AI: As machine learning continues to innovate, ADBench provides a foundational framework against which new anomaly detection methodologies can be benchmarked. The expanding applications of AD across sectors like finance, healthcare, and cybersecurity heighten the importance of such a comprehensive evaluation mechanism.

In summation, ADBench offers a robust evaluation framework that reveals critical insights into the practical performance and limitations of current AD algorithms. By encompassing various operational contexts and data challenges, ADBench establishes itself as a reference point for future research, influencing both algorithm development and application strategy across anomaly detection domains.

PDF Markdown

Related Papers

YouTube

Show All Videos