Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy (2110.02642v5)

Published 6 Oct 2021 in cs.LG

Abstract: Unsupervised detection of anomaly points in time series is a challenging problem, which requires the model to derive a distinguishable criterion. Previous methods tackle the problem mainly through learning pointwise representation or pairwise association, however, neither is sufficient to reason about the intricate dynamics. Recently, Transformers have shown great power in unified modeling of pointwise representation and pairwise association, and we find that the self-attention weight distribution of each time point can embody rich association with the whole series. Our key observation is that due to the rarity of anomalies, it is extremely difficult to build nontrivial associations from abnormal points to the whole series, thereby, the anomalies' associations shall mainly concentrate on their adjacent time points. This adjacent-concentration bias implies an association-based criterion inherently distinguishable between normal and abnormal points, which we highlight through the \emph{Association Discrepancy}. Technically, we propose the \emph{Anomaly Transformer} with a new \emph{Anomaly-Attention} mechanism to compute the association discrepancy. A minimax strategy is devised to amplify the normal-abnormal distinguishability of the association discrepancy. The Anomaly Transformer achieves state-of-the-art results on six unsupervised time series anomaly detection benchmarks of three applications: service monitoring, space & earth exploration, and water treatment.

Citations (383)

View on Semantic Scholar

Summary

The paper introduces a novel anomaly-attention mechanism that exploits association discrepancy to differentiate anomalies from regular time points.
It integrates a learnable Gaussian kernel for prior-association and refines series-association via self-attention weights to improve detection accuracy.
Empirical results on benchmarks like SMAP and PSM demonstrate superior precision, recall, and F1-scores across diverse anomaly types.

Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy

The presented paper introduces the "Anomaly Transformer" methodology for unsupervised time series anomaly detection, leveraging the intrinsic properties of Transformers in temporal association modeling. The cornerstone of the proposed methodology is the identification and utilization of association discrepancy for distinguishing normal points from anomalies in time series data.

Traditionally, unsupervised anomaly detection in time series has been impeded by the challenge of extrapolating informative, distinguishable criteria from complex temporal dynamics. Classical approaches such as local outlier factor (LOF) or one-class SVM do not capitalize on temporal information, limiting their effectiveness in real-world scenarios. Contemporary deep learning models have advanced this field, particularly through representation learning techniques, including reconstruction-based and autoregression-based paradigms. However, these focus mainly on pointwise representation and provide limited contextual understanding, especially in the presence of rare anomalies.

In contrast, the Anomaly Transformer utilizes a novel anomaly-attention mechanism that seeks to expose the inherent discrepancies between associations formed by anomalous and non-anomalous time points. The paper articulates that anomalies, due to their rarity, inherently present challenges in forming robust associations with the broader temporal series, confining their associations to adjacent time points. This observation is conceptualized as "association discrepancy," serving as a novel anomaly detection criterion.

Technically, the Anomaly Transformer architecture integrates this anomaly-attention framework, which includes two interactions: prior-association and series-association. The prior-association is modeled via a learnable Gaussian kernel, uniquely capturing the adjacency-bias of anomalies, while the series-association is refined from the self-attention weights, denoting the broader association profile of each time point.

The model incorporates a minimax strategy to magnify the discriminative power of association discrepancy. By optimizing the divergence between prior- and series-associations (computed via the symmetrized KL divergence), the model accentuates differences between normal and unusual temporal patterns. This dual optimization encompasses minimization to fit normal associations and maximization to pinpoint anomalies.

The empirical validation of this model spans six standard benchmarks, encompassing applications such as server management, space exploration telemetry, and water treatment facilities. The Anomaly Transformer demonstrates superior performance against industry-standard methodologies, evidenced by consistent improvements across precision, recall, and F1-score metrics. For instance, on datasets such as SMAP and PSM, it achieves F1-scores of 96.69% and 97.89%, respectively.

Furthermore, the Anomaly Transformer is robust to various anomaly types, validated by experiments on the NeurIPS-TS benchmark, capturing point-global, pattern-contextual, pattern-shapelet, pattern-seasonal, and pattern-trend anomalies with enhanced accuracy.

In conclusion, the Anomaly Transformer's ability to model and exploit association discrepancy introduces significant advancements in unsupervised time series anomaly detection. The approach sets a precedence for utilizing transformers' capacity for temporal and relational modeling, showcasing potential for further research in extending its capabilities to multivariate time series and augmenting computational efficiency. Future explorations could also detail theoretical analyses in line with classical paradigms like autoregression, enhancing the model's interpretability and deployment across broader domains.

PDF Markdown

Related Papers

YouTube

Show All Videos