- The paper introduces a novel agent-based anomaly detection system that employs LLMs for autonomous rule generation on time-series data.
- It leverages a multi-stage pipeline with Detection, Repair, and Review Agents to ensure explainable, reproducible, and accurate rule generation.
- Evaluation on public and internal datasets shows significant F1 score improvements, demonstrating enhanced performance over traditional methods.
Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via LLMs
Introduction
The paper introduces Argos, an innovative agentic system for anomaly detection in time-series data within cloud infrastructures, employing LLMs for autonomous rule generation. Argos is designed to enhance anomaly detection systems by ensuring explainability, reproducibility, and autonomy, which are often not simultaneously achieved by existing approaches.
System Design and Architecture
Argos leverages a multi-stage design, comprising data preprocessing, rule training, and deployment phases. The key components of Argos are:
- Data Preprocessor: Scales, index, and tokenizes input data for efficient processing within the context of time-series anomaly detection.
- Training Engine: Implements an agent-based pipeline with Detection, Repair, and Review Agents, ensuring the generation of syntactically correct and accurate anomaly detection rules.
- Detection Agent: Proposes rules in Python based on input data.
- Repair Agent: Corrects syntax errors in proposed rules.
- Review Agent: Evaluates and iterates rules to improve accuracy.
- Deployment Components: Include an Anomaly Detector and Aggregator, combining outputs from both base detectors and LLM-generated rules to ensure accuracy and resource efficiency.
Figure 1: The overall design of Argos.
Autonomous Rule Generation
Argos distinguishes itself through autonomous rule generation via LLMs. The Detection Agent generates executable Python code for anomaly detection rules, bridging the gap between domain-specific expertise and machine-generated logic. Existing LLM techniques are integrated to ensure rules that are both explainable and reproducible, while maintaining the adaptability of the system to varying anomaly patterns.
Correctness and Accuracy
Argos employs iterative feedback loops between the Repair and Review Agents to improve rule accuracy and correctness. This approach is inspired by backpropagation, ensuring the continuous improvement of anomaly detection rules through systematic error correction and performance evaluation.
Model Fusion for Accuracy Guarantee
The model fusion strategy in Argos combines the strengths of LLM-generated rules and existing well-tuned anomaly detectors to guarantee accuracy improvements. This ensures that new, autonomously generated rules not only match but often exceed the performance of traditional models.
Evaluation
Argos was evaluated on public datasets such as KPI and Yahoo, as well as an internal Microsoft dataset. The results show a significant improvement in F1​ scores compared to state-of-the-art methods, with up to a 9.5-point increase on public datasets and a 28.3-point increase on internal datasets. These evaluations underscore Argos' effectiveness in addressing the challenges of time-series anomaly detection.
Figure 2: Comparison of the correctness rate and average test F1 score of the Training Engine with only the Detection Agent versus full Training Engine with Repair and Review Agents.
Conclusion
Argos represents a substantial advancement in time-series anomaly detection, effectively addressing the triad of explainability, reproducibility, and autonomy. Through the autonomous generation of detection rules via LLMs, Argos provides an efficient, adaptable, and robust solution for anomaly detection in cloud infrastructures. The system's design ensures higher accuracy and efficiency, making it a valuable tool for enhancing the reliability of cloud services. Future directions may focus on expanding Argos’ applications to other domains and integrating more sophisticated model fusion techniques to further improve its performance.