Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection (2206.15476v4)

Published 30 Jun 2022 in cs.LG

Abstract: Analyzing the distribution shift of data is a growing research direction in nowadays Machine Learning (ML), leading to emerging new benchmarks that focus on providing a suitable scenario for studying the generalization properties of ML models. The existing benchmarks are focused on supervised learning, and to the best of our knowledge, there is none for unsupervised learning. Therefore, we introduce an unsupervised anomaly detection benchmark with data that shifts over time, built over Kyoto-2006+, a traffic dataset for network intrusion detection. This type of data meets the premise of shifting the input distribution: it covers a large time span ($10$ years), with naturally occurring changes over time (eg users modifying their behavior patterns, and software updates). We first highlight the non-stationary nature of the data, using a basic per-feature analysis, t-SNE, and an Optimal Transport approach for measuring the overall distribution distances between years. Next, we propose AnoShift, a protocol splitting the data in IID, NEAR, and FAR testing splits. We validate the performance degradation over time with diverse models, ranging from classical approaches to deep learning. Finally, we show that by acknowledging the distribution shift problem and properly addressing it, the performance can be improved compared to the classical training which assumes independent and identically distributed data (on average, by up to $3\%$ for our approach). Dataset and code are available at https://github.com/bit-ml/AnoShift/.

Citations (16)

Summary

  • The paper presents AnoShift, a benchmark designed to assess the robustness of unsupervised anomaly detection methods against distribution shifts.
  • It evaluates both classical and deep learning models using ROC-AUC and PR-AUC metrics, revealing significant performance drops from in-distribution to far-distribution data.
  • The findings highlight the challenges in anomaly detection under non-stationary conditions and motivate the development of adaptive techniques for improved resilience.

Analysis of AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection

The paper "AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection" introduces a novel benchmark specifically designed to evaluate the performance of unsupervised anomaly detection algorithms under conditions of distribution shift—a common issue in real-world deployment that undermines model performance. The focus on unsupervised methods addresses situations where labeled data is scant or nonexistent, a scenario often encountered in practical anomaly detection tasks.

Contribution and Methodology

AnoShift evaluates how various anomaly detection algorithms fare when exposed to data distribution shifts over time. The dataset and benchmark split the data into three categories: In-Distribution (IID), Near-Distribution (NEAR), and Far-Distribution (FAR) to simulate realistic temporal shifts. This design allows the paper of the robustness of models as they transition from well-modeled data distributions (IID) to moderately shifted (NEAR) and extensively shifted domains (FAR).

The benchmark evaluates a range of algorithms, including both classical methods such as One-Class Support Vector Machines (OC-SVM) and Isolation Forest (IsoForest), as well as contemporary deep learning-based methods like DeepSVDD, Autoencoders (AE), and BERT for anomalies. Performance metrics used in the paper include ROC-AUC, PR-AUC for inliers, and PR-AUC for outliers, providing a holistic assessment of models' efficacy in differentiating between normal and anomalous data across the levels of distribution shift.

Key Findings

  1. Classical Methods: Among the classical methods, the Locally Outlier Factor (LOF) achieved the highest ROC-AUC score under IID conditions at 91.50%, which declined significantly under far distribution shifts to 34.96%. This substantial drop highlights the sensitivity of traditional methods to distribution shifts.
  2. Deep Learning Methods: DeepSVDD exhibited the highest efficacy within the deep learning approaches on the IID subset with a ROC-AUC of 92.67%, but similarly demonstrated a decrease in performance as the distribution shifted to FAR, where it showed a 34.53% ROC-AUC. This trend was consistent across other deep learning models, emphasizing the substantial challenge distribution shifts pose across both classical and deep learning paradigms.
  3. Robustness to Shift: Across the board, all models showed degradation in performance when moving from IID to FAR datasets. This result indicates a common vulnerability to non-stationarities in the data, regardless of the algorithmic approach, necessitating robust solutions for unsupervised anomaly detection under distribution shifts.

Implications and Future Work

The benchmark provides a standardized platform to assess how anomaly detection models handle distribution shifts, which is pivotal for applications such as network security, fraud detection, and health monitoring. The consistent decline in efficacy under shifted distributions underscores the imperative for developing techniques explicitly designed to maintain performance amidst such shifts. Possible future directions could involve adaptive models that can learn from the features of distribution shifts or the use of transfer learning techniques to better handle non-stationary environments.

Moreover, the AnoShift benchmark could facilitate advancements by providing a rigorous testing ground for novel methods seeking to improve robustness to distribution shifts. This work challenges the anomaly detection community to rethink model design and training methodologies, aiming to ensure reliable anomaly detection in dynamic real-world settings. Consequently, further exploration into the fundamental mechanisms causing these observed degradations may foster the creation of models that inherently possess better resilience to changes in data distribution.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub