Large language models can be zero-shot anomaly detectors for time series? (2405.14755v3)

Published 23 May 2024 in cs.LG

Abstract: Recent studies have shown the ability of LLMs to perform a variety of tasks, including time series forecasting. The flexible nature of these models allows them to be used for many applications. In this paper, we present a novel study of LLMs used for the challenging task of time series anomaly detection. This problem entails two aspects novel for LLMs: the need for the model to identify part of the input sequence (or multiple parts) as anomalous; and the need for it to work with time series data rather than the traditional text input. We introduce sigLLM, a framework for time series anomaly detection using LLMs. Our framework includes a time-series-to-text conversion module, as well as end-to-end pipelines that prompt LLMs to perform time series anomaly detection. We investigate two paradigms for testing the abilities of LLMs to perform the detection task. First, we present a prompt-based detection method that directly asks a LLM to indicate which elements of the input are anomalies. Second, we leverage the forecasting capability of a LLM to guide the anomaly detection process. We evaluated our framework on 11 datasets spanning various sources and 10 pipelines. We show that the forecasting method significantly outperformed the prompting method in all 11 datasets with respect to the F1 score. Moreover, while LLMs are capable of finding anomalies, state-of-the-art deep learning models are still superior in performance, achieving results 30% better than LLMs.

PDF Abstract

An Overview of the Potential of LLMs as Zero-Shot Anomaly Detectors for Time Series

The paper "LLMs can be zero-shot anomaly detectors for time series?" explores the applicability of LLMs in the domain of time series anomaly detection, a challenging task typically dominated by specialized deep learning models. The authors propose a novel framework named SigLLM and present two methodologies: Prompter and Detector, leveraging the inherent capabilities of LLMs in a zero-shot context.

The goal is to explore whether the autoregressive nature of LLMs, typically adept at text-based tasks, can be adapted for time series data anomaly detection without prior task-specific training. This adaptation requires not only working with a novel data format but also performing a task qualitatively distinct from traditional applications of LLMs.

Methodological Framework: SigLLM

SigLLM serves as the backbone for transforming time series data into formats compatible with text-based LLMs. The framework involves several preprocessing steps, such as scaling, quantization, and tokenization, to convert time series into a format that can be ingested by the LLMs. Additionally, rolling windows are employed to manage data exceeding LLMs’ context length limitations, ensuring computational feasibility.

Prompter: This method engages LLMs through carefully engineered prompts to identify anomalies within a sequence of time series data. It capitalizes on the LLM's ability to generate context-based outputs, albeit without substantial success in avoiding false positives, as shown by detailed trials with different prompt engineering strategies.
Detector: This approach leverages LLMs’ forecasting capabilities, utilizing residual errors between predicted and actual values to identify anomalies. The method exploits LLMs’ strengths in predicting sequential data while circumventing traditional learning phases.

Empirical Results and Analysis

The authors evaluate SigLLM's performance against existing state-of-the-art anomaly detection models across 492 signals from 11 diverse datasets. The evaluation metrics are precision, recall, and F1 score. The results suggest that while LLMs can outperform some transformer-based models such as AnomalyTransformer, they are still not on par with classical statistical methods like ARIMA or deep learning frameworks like AER and LSTM DT.

The authors report that the Detector method generally outperformed Prompter, achieving an average F1 score of 0.525. Nonetheless, the performance of LLMs was about 30% lower than the best-performing deep learning models, underscoring an existing gap in efficacy.

Implications and Future Directions

The paper highlights promising insights into the potential of LLMs for expanding into new domains beyond text, suggesting a future where AI models could operate across multiple data modalities ubiquitously. Despite the current limitations, such as high false positive rates and considerable inference times, advances in LLM architectures and post-processing techniques could bridge existing performance gaps.

Future research could focus on:

Enhancing LLM architectures to handle larger contexts directly, eliminating the need for rolling windows.
Improving post-processing techniques to better filter false positives, especially within the Prompter method.
Investigating more sophisticated anomaly detection models that integrate the forecasting strengths of LLMs with other machine learning techniques.

In conclusion, while LLMs have demonstrated some potential for zero-shot time series anomaly detection, significant advancements are required before they can challenge the status quo in anomaly detection paradigms. Nonetheless, their capacity to operate without extensive retraining presents an exciting avenue for future exploration.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Sarah Alnegheimish (13 papers)
Linh Nguyen (32 papers)
Laure Berti-Equille (19 papers)
Kalyan Veeramachaneni (38 papers)

Citations (4)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/AIHealthMIT/status/1803463956519903337

YouTube

Show All Videos