An Evaluation of Anomaly Detection and Diagnosis in Multivariate Time Series (2109.11428v1)

Published 23 Sep 2021 in cs.LG, cs.AI, and stat.ML

Abstract: Several techniques for multivariate time series anomaly detection have been proposed recently, but a systematic comparison on a common set of datasets and metrics is lacking. This paper presents a systematic and comprehensive evaluation of unsupervised and semi-supervised deep-learning based methods for anomaly detection and diagnosis on multivariate time series data from cyberphysical systems. Unlike previous works, we vary the model and post-processing of model errors, i.e. the scoring functions independently of each other, through a grid of 10 models and 4 scoring functions, comparing these variants to state of the art methods. In time-series anomaly detection, detecting anomalous events is more important than detecting individual anomalous time-points. Through experiments, we find that the existing evaluation metrics either do not take events into account, or cannot distinguish between a good detector and trivial detectors, such as a random or an all-positive detector. We propose a new metric to overcome these drawbacks, namely, the composite F-score ($Fc_1$), for evaluating time-series anomaly detection. Our study highlights that dynamic scoring functions work much better than static ones for multivariate time series anomaly detection, and the choice of scoring functions often matters more than the choice of the underlying model. We also find that a simple, channel-wise model - the Univariate Fully-Connected Auto-Encoder, with the dynamic Gaussian scoring function emerges as a winning candidate for both anomaly detection and diagnosis, beating state of the art algorithms.

Authors (5)

Astha Garg (1 paper)
Wenyu Zhang (47 papers)
Jules Samaran (3 papers)
Savitha Ramasamy (22 papers)
Chuan-Sheng Foo (41 papers)

Citations (202)

View on Semantic Scholar

Summary

A Systematic Evaluation of Deep Learning Techniques for Anomaly Detection in Multivariate Time Series

The paper "An Evaluation of Anomaly Detection and Diagnosis in Multivariate Time Series" by Astha Garg, Wenyu Zhang, Jules Samaran, Ramasamy Savitha, and Chuan-Sheng Foo, presents a rigorous and comprehensive paper of anomaly detection and diagnosis methodologies in multivariate time series (MVTS). The focus lies primarily on unsupervised and semi-supervised deep learning approaches within cyber-physical systems (CPS). This scholarly work seeks to fill the gap of systematic comparison by utilizing a consistent set of datasets and metrics, providing insights into their efficacy via empirical evaluation.

Methodological Framework

The authors introduce a modular framework for anomaly detection in MVTS, comprising three components: models, scoring functions, and thresholding functions. They evaluate 10 different deep learning models against 4 distinct scoring functions. The models tested range from Univariate Fully-Connected Auto-Encoder (UAE) to more sophisticated architectures like BeatGAN and OmniAnomaly.

Notably, this paper emphasizes the importance of scoring functions, which convert raw detection signals into anomaly scores, illuminating that dynamic scoring functions, such as the dynamic Gaussian scoring, outperform static counterparts. This separation of concerns, between model and scoring, allows for deeper understanding of performance factors in anomaly detection pipelines.

Dataset Characteristics

The investigation employed seven diverse and publicly available datasets representing real-world CPS scenarios, including water treatment and spacecraft telemetry. Each dataset comprises multi-channel time series data, where the nature of anomalies were either induced or classified by domain experts. This extensive dataset coverage ensures that findings are robust across different application domains.

Evaluation Metrics

Critically, the paper challenges and extends existing evaluation metrics. It introduces the composite F-score ( $Fc_1$ ), designed to balance event-wise recall with point-wise precision, addressing limitations in existing metrics such as the point-adjusted F1 score which can be overly optimistic in certain implementations.

Key Observations

Scoring Function Superiority: Dynamic scoring functions that adapt during the testing phase demonstrated significantly better anomaly detection performance compared to static scoring.
Model Insights: Contrary to expectations that complex architectures outperform simpler models, the paper reveals that the UAE, a simpler channel-wise model, achieved superior detection and diagnosis performance when paired with a dynamic Gaussian scoring function. This finding suggests that, for temporal anomalies prevalent in CPS datasets evaluated, lightweight models might be sufficient, potentially reducing computational burdens without sacrificing accuracy.
Metric Reliability: Through comparisons using the proposed $Fc_1$ metric, the paper highlights inadequacies in traditional metrics, calling for a reevaluation of benchmarking practices in MVTS anomaly detection literature.

Implications and Future Directions

Practically, this research suggests revisiting and potentially simplifying current anomaly detection frameworks in CPS environments, using UAE with dynamic scoring as a baseline. Theoretically, it opens up avenues to explore why simpler models perform well in specific contexts and how this might translate to other types of anomalies or datasets featuring cross-channel dependencies.

Future research might delve into hybrid models that combine the strong temporal feature extraction of UAE with inter-channel anomaly detection capabilities, extending applicability to datasets with more complex state changes or cross-channel dependencies. Moreover, additional investigation into the composite $Fc_1$ metric across different domains could further validate its robustness and encourage the development of even more nuanced evaluation frameworks.

In conclusion, this paper provides a valuable resource for researchers and practitioners by offering insights into the performance dynamics of anomaly detection methods and reshaping the metrics used to evaluate them. It underscores the necessity of choosing appropriate scoring functions and metrics to truly capture the efficacy of anomaly detection algorithms in MVTS.

PDF Markdown

Related Papers

YouTube

Show All Videos