Weakly Supervised Anomaly Detection: A Survey (2302.04549v1)

Published 9 Feb 2023 in cs.LG and cs.AI

Abstract: Anomaly detection (AD) is a crucial task in machine learning with various applications, such as detecting emerging diseases, identifying financial frauds, and detecting fake news. However, obtaining complete, accurate, and precise labels for AD tasks can be expensive and challenging due to the cost and difficulties in data annotation. To address this issue, researchers have developed AD methods that can work with incomplete, inexact, and inaccurate supervision, collectively summarized as weakly supervised anomaly detection (WSAD) methods. In this study, we present the first comprehensive survey of WSAD methods by categorizing them into the above three weak supervision settings across four data modalities (i.e., tabular, graph, time-series, and image/video data). For each setting, we provide formal definitions, key algorithms, and potential future directions. To support future research, we conduct experiments on a selected setting and release the source code, along with a collection of WSAD methods and data.

Citations (24)

View on Semantic Scholar

Summary

The paper presents the first comprehensive survey on WSAD, categorizing methods by incomplete, inexact, and inaccurate supervision.
It reviews techniques such as anomaly feature representation, score learning, and label propagation across various data modalities.
The survey highlights future research avenues, including integrating hybrid labeling, SSL, and meta-learning to enhance anomaly detection.

Weakly Supervised Anomaly Detection: A Comprehensive Survey

Anomaly detection (AD) is a critical aspect of machine learning that finds extensive applications in areas such as healthcare, finance, and security. The primary objective of AD is to identify outliers or deviant samples within a dataset that do not fit the general distribution. However, the process of obtaining complete and precise labels for AD is frequently hindered by the high cost and complexity of data annotation, leading to the broader concept of weakly supervised anomaly detection (WSAD). WSAD attempts to tackle AD challenges using incomplete, inexact, and inaccurate supervision. This paper offers the first exhaustive survey of WSAD methodologies, categorizing them into three weak supervision settings across four data modalities—tabular, graph, time-series, and image/video data.

Key Concepts and Categorization

This survey positions WSAD within the broader context of weakly supervised learning (WSL), introducing three distinct categories:

Incomplete Supervision: This occurs when only a subset of input data is labeled. Common in many AD scenarios, this approach leverages strategies like anomaly feature representation, anomaly score learning, and label propagation on graphs to maximize available limited label information.
Inexact Supervision: In this context, the labels provided are not as specific as needed for effective detection—often the labels are available at a group level rather than an individual instance level. Inexact supervision primarily employs multi-instance learning (MIL) techniques to address these label granularity challenges, especially in video data.
Inaccurate Supervision: This deals with scenarios where labels contain noise and inaccuracies. It utilizes methods like ensemble learning to aggregate insights from multiple noisy label sources or utilizes denoising networks to filter out label noise.

Methodological Insights

Incomplete supervision is addressed using a variety of techniques:

Anomaly Feature Representation Learning: Methods such as DeepSAD adapt unsupervised models to better leverage partial labels. They consider not only feature representation but the spatial distribution of anomalies.
Anomaly Score Learning: These methods focus on directly estimating anomaly scores from data to enhance detection accuracy under limited supervision.
Label Propagation in Graphs: Graph-based methods use label propagation strategies to exploit the inherent structure within the data.

For inexact supervision, most current research focuses on video anomaly detection using MIL frameworks, leveraging coarse-grained video labels to infer finer-level anomalies.

In dealing with inaccurate supervision, ensemble approaches such as ADMoE effectively integrate multiple noisy supervision signals using a shared model structure, tailored for high scalability and specialization of anomaly detection tasks.

Implications and Future Directions

The research indicates a massive potential for expanded application in areas inadequately covered, particularly in leveraging inexact supervision for data modalities such as graphs and time-series. Furthermore, addressing the challenges of hybrid labeling scenarios that involve multiple weak supervision categories concurrently remains underexplored. Future work should focus on the interplay and potential integration of these weak supervision types, potentially leading to more robust and flexible AD systems. Moreover, continual improvement in the application of semi-supervised, SSL, and meta-learning paradigms could lead to the next wave of advancements in WSAD.

Ultimately, this survey underlines the diversity of approaches within WSAD, while spotlighting areas ripe for exploration in expanding the efficacy and applicability of AD across multiple real-world challenges. The contribution of this survey is further consolidated by the release of source code and a collection of existing WSAD methods, acting as a catalyst for future development and research in this evolving field.

PDF Markdown

Related Papers

GitHub

GitHub - yzhao062/WSAD: A Collection of Resources for Weakly-supervised Anomaly Detection (WSAD) (170 stars)