Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UniParser: A Unified Log Parser for Heterogeneous Log Data (2202.06569v1)

Published 14 Feb 2022 in cs.SE

Abstract: Logs provide first-hand information for engineers to diagnose failures in large-scale online service systems. Log parsing, which transforms semi-structured raw log messages into structured data, is a prerequisite of automated log analysis such as log-based anomaly detection and diagnosis. Almost all existing log parsers follow the general idea of extracting the common part as templates and the dynamic part as parameters. However, these log parsing methods, often neglect the semantic meaning of log messages. Furthermore, high diversity among various log sources also poses an obstacle in the generalization of log parsing across different systems. In this paper, we propose UniParser to capture the common logging behaviours from heterogeneous log data. UniParser utilizes a Token Encoder module and a Context Encoder module to learn the patterns from the log token and its neighbouring context. A Context Similarity module is specially designed to model the commonalities of learned patterns. We have performed extensive experiments on 16 public log datasets and our results show that UniParser outperperforms state-of-the-art log parsers by a large margin.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Yudong Liu (31 papers)
  2. Xu Zhang (343 papers)
  3. Shilin He (25 papers)
  4. Hongyu Zhang (147 papers)
  5. Liqun Li (12 papers)
  6. Yu Kang (61 papers)
  7. Yong Xu (432 papers)
  8. Minghua Ma (33 papers)
  9. Qingwei Lin (81 papers)
  10. Yingnong Dang (10 papers)
  11. Saravan Rajmohan (85 papers)
  12. Dongmei Zhang (193 papers)
Citations (79)

Summary

We haven't generated a summary for this paper yet.