Papers
Topics
Authors
Recent
2000 character limit reached

LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection

Published 9 Jan 2024 in cs.LG, cs.AI, and cs.SE | (2401.04749v1)

Abstract: Log anomaly detection is a key component in the field of artificial intelligence for IT operations (AIOps). Considering log data of variant domains, retraining the whole network for unknown domains is inefficient in real industrial scenarios. However, previous deep models merely focused on extracting the semantics of log sequences in the same domain, leading to poor generalization on multi-domain logs. To alleviate this issue, we propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains, where we establish a two-stage process including the pre-training and adapter-based tuning stage. Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data. Then, we transfer such knowledge to the target domain via shared parameters. Besides, the Log-Attention module is proposed to supplement the information ignored by the log-paring. The proposed method is evaluated on three public and one real-world datasets. Experimental results on multiple benchmarks demonstrate the effectiveness of our LogFormer with fewer trainable parameters and lower training costs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. GripRank: Bridging the Gap between Retrieval and Generation via the Generative Knowledge Improved Passage Ranking. In CIKM 2023, 36–46. ACM.
  2. Anomaly detection from log files using data mining techniques. In Information Science and Applications. Springer.
  3. Bert: Pre-training of deep bidirectional transformers for language understanding. NAACL 2019.
  4. Spell: Streaming parsing of system event logs. In ICDM 2016.
  5. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In CCS 2017.
  6. LogLG: Weakly Supervised Log Anomaly Detection via Log-Event Graph Construction. In DASFAA 2023, volume 13946 of Lecture Notes in Computer Science, 490–501. Springer.
  7. LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation. In EMNLP 2022, 2862–2872. Association for Computational Linguistics.
  8. OWL: A Large Language Model for IT Operations. CoRR, abs/2309.09298.
  9. Drain: An online log parsing approach with fixed depth tree. In ICWS 2017, 33–40.
  10. Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics. CoRR, abs/2008.06448.
  11. Long short-term memory. Neural computation, 1735–1780.
  12. Parameter-efficient transfer learning for NLP. In ICML 2019.
  13. LoRA: Low-Rank Adaptation of Large Language Models. In ICLR 2022.
  14. HitAnomaly: Hierarchical Transformers for Anomaly Detection in System Log. TNSM, 17(4): 2064–2076.
  15. Abstracting Execution Logs to Execution Events for Enterprise Applications (Short Paper). In QSIC 2008, 181–186.
  16. Detecting Anomaly in Big Data System Logs Using Convolutional Neural Network. In DASC 2018, 151–158.
  17. Clustering event logs using iterative partitioning. In KDD 2009, 1255–1264.
  18. LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. In IJCAI 2019.
  19. What Supercomputers Say: A Study of Five System Logs. In DSN 2007, 575–584.
  20. Training language models to follow instructions with human feedback. In NeurIPS.
  21. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In EMNLP 2019, 3980–3990.
  22. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In NeurIPS.
  23. Detecting large-scale system problems by mining console logs. In ICML 2010.
  24. UM4: Unified Multilingual Multiple Teacher-Student Model for Zero-Resource Neural Machine Translation. In IJCAI 2022, 4454–4460. ijcai.org.
  25. PLELog: Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation. In ICSE 2021, 230–231.
  26. Semi-supervised Log-based Anomaly Detection via Probabilistic Label Estimation. In ICSE 2021, 1448–1460.
  27. A study of the performance of general compressors on log files. ESE, 25(5): 3043–3085.
  28. Automated IT system failure prediction: A deep learning approach. In BigData 2016.
  29. Rapidand robust impact assessment of software changes in large internet-based services. In ENET 2015.
  30. Robust log-based anomaly detection on unstable log data. In FSE 2019.
  31. Serial or Parallel? Plug-able Adapter for multilingual machine translation. CoRR.
Citations (10)

Summary

  • The paper proposes a two-stage framework that pre-trains a Transformer-based model on source log data and adapts it using lightweight adapters for improved anomaly detection.
  • It introduces a novel Log-Attention module that preserves critical semantic information lost during log parsing, enhancing detection accuracy.
  • Experimental results on public and real-world datasets show that LogFormer achieves state-of-the-art performance with fewer trainable parameters and lower training costs.

LogFormer: Pre-training and Tuning Pipeline for Log Anomaly Detection

This paper introduces LogFormer, a two-stage framework for log anomaly detection designed to improve generalization across different domains. The core idea is to pre-train a Transformer-based model on a source domain to capture shared semantic knowledge of log data, and then adapt this knowledge to a target domain using adapter-based tuning. The authors also introduce a Log-Attention module to supplement information lost during log parsing. The proposed method is evaluated on three public datasets and one real-world dataset, demonstrating its effectiveness with fewer trainable parameters and lower training costs.

Addressing Log Anomaly Detection Challenges

Log anomaly detection is crucial for monitoring data peculiarities in large-scale IT systems. Traditional methods struggle with the increasing volume of log data and the semantic complexity of log messages. Existing deep learning methods often rely on log parsing to extract templates, which can lead to a loss of semantic information. Furthermore, these methods typically focus on single-domain logs, limiting their ability to generalize to new domains or accommodate continuous iteration of log data. LogFormer addresses these challenges by preserving shared semantic knowledge between different domains and avoiding information loss through a novel Log-Attention module. The paper highlights the shared semantic space across different domains (Figure 1), which motivates the pre-training approach. Figure 1

Figure 1: The same anomaly from multiple domains.

LogFormer Architecture and Methodology

LogFormer's architecture consists of two main stages: pre-training and adapter-based tuning (Figure 2). The pre-training stage involves training a Transformer-based model with a Log-Attention module on a source domain to acquire common semantic knowledge from log sequences. The Log-Attention module is designed to incorporate information from parameters that are typically discarded during log parsing. In the adapter-based tuning stage, the pre-trained model is adapted to the target domain by adding lightweight adapters to the encoder layers. Only the parameters of the adapters are updated during this stage, while the parameters of the pre-trained model are frozen, enabling efficient knowledge transfer with minimal training costs.

Key Components: Log-Attention and Adapter-Based Tuning

The Log-Attention module is a key innovation in LogFormer, designed to address the information loss caused by log parsing. After parsing, the module encodes the parameters of each log sequence using a linear layer, and then assigns a learnable scalar to each output, which serves as a bias term in self-attention (Figure 3). This allows the model to aggregate both keywords and parameters information, improving its ability to detect anomalies. Figure 2

Figure 2: Logs and Templates.

(Figure 3)

Figure 3: Log-Attention.

Adapter-based tuning is another important aspect of LogFormer, enabling efficient knowledge transfer from the source domain to the target domain. Adapters are inserted parallel to the Log-Attention layer and feedforward layer (Figure 4). This design allows the adapter to use input information better with original complete encoders. By updating only the parameters of the adapters during target domain adaptation, LogFormer significantly reduces the number of trainable parameters and lowers training costs compared to fine-tuning the entire model. Figure 4

Figure 4: Encoder with Adapters.

Experimental Results and Analysis

The authors conducted extensive experiments on three public datasets (HDFS, BGL, and Thunderbird) and one real-world dataset (GAIA) to evaluate the performance of LogFormer. The results demonstrate that LogFormer achieves state-of-the-art performance on all three public benchmark datasets, with fewer trainable parameters and lower training costs than existing methods. Ablation studies were performed to assess the impact of pre-training, adapter-based tuning, and the Log-Attention module. The results confirm the effectiveness of each component and highlight the benefits of the two-stage training approach. Specifically, fine-tuning converges faster than training from scratch, which shows the learned knowledge from the source domain is valuable. Also, LogFormer generates a little higher F1F_1 score (1%\% on average) than directly fine-tuning the pre-trained model on two datasets.

Impact of Pre-training and Low-Resource Performance

The paper includes an analysis of the impact of pre-training on the convergence speed and performance of LogFormer. The results show that pre-training accelerates convergence and improves the initial performance of the model, demonstrating the value of transferring knowledge from the source domain. The performance of LogFormer under low-resource settings, with fewer than 20k training examples, was also assessed. The results show that LogFormer provides acceptable results in the low-resource setting, which is highly parameter-efficient for log analysis. The loss curves in the training process were compared (Figure 5), showing the faster convergence achieved through pre-training. Figure 5

Figure 5: Loss and F1F_1 score on the test set.

Practical Application and Generalization

LogFormer has been successfully applied to a cloud service company, demonstrating its practical utility in real-world scenarios. The results on the GAIA dataset, a real-world distributed dataset, show that LogFormer achieves the best performance compared to other baselines, highlighting its ability to generalize to complex, multi-domain, and continuously evolved data. The model has been running stably for over 3000 hours on this system, further demonstrating its robustness and reliability.

Conclusion

LogFormer presents a novel and effective approach to log anomaly detection, addressing the challenges of generalization and information loss in existing methods. The two-stage pre-training and adapter-based tuning pipeline, combined with the Log-Attention module, enables LogFormer to achieve state-of-the-art performance with fewer trainable parameters and lower training costs. The experimental results and practical application demonstrate the potential of LogFormer for real-world deployment in large-scale IT systems. Further research could explore different adapter architectures, pre-training objectives, and applications to other anomaly detection tasks.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.