Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Log Summarisation for Defect Evolution Analysis (2403.08358v1)

Published 13 Mar 2024 in cs.SE and cs.CL

Abstract: Log analysis and monitoring are essential aspects in software maintenance and identifying defects. In particular, the temporal nature and vast size of log data leads to an interesting and important research question: How can logs be summarised and monitored over time? While this has been a fundamental topic of research in the software engineering community, work has typically focused on heuristic-, syntax-, or static-based methods. In this work, we suggest an online semantic-based clustering approach to error logs that dynamically updates the log clusters to enable monitoring code error life-cycles. We also introduce a novel metric to evaluate the performance of temporal log clusters. We test our system and evaluation metric with an industrial dataset and find that our solution outperforms similar systems. We hope that our work encourages further temporal exploration in defect datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Site Reliability Engineering: How Google Runs Production Systems. http://landing.google.com/sre/book.html
  2. Density-Based Clustering over an Evolving Data Stream with Noise. In SDM.
  3. Logram: Efficient Log Parsing Using n𝑛nitalic_nn-Gram Dictionaries. IEEE Transactions on Software Engineering 48, 3 (2022), 879–892. https://doi.org/10.1109/TSE.2020.3007554
  4. Steven Davies and Marc Roper. 2013. Bug localisation through diverse sources of information. In 2013 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). 126–131. https://doi.org/10.1109/ISSREW.2013.6688891
  5. Min Du and Feifei Li. 2019. Spell: Online Streaming Parsing of Large Unstructured System Logs. IEEE Transactions on Knowledge and Data Engineering 31, 11 (2019), 2213–2227. https://doi.org/10.1109/TKDE.2018.2875442
  6. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS ’17). Association for Computing Machinery, New York, NY, USA, 1285–1298. https://doi.org/10.1145/3133956.3134015
  7. Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis. In 2009 Ninth IEEE International Conference on Data Mining. 149–158. https://doi.org/10.1109/ICDM.2009.60
  8. Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis. 2009 Ninth IEEE International Conference on Data Mining (2009), 149–158.
  9. Maria Grigorieva and Dmitry Grin. 2021. Clustering error messages produced by distributed computing infrastructure during the processing of high energy physics data. International Journal of Modern Physics A 36 (04 2021), 2150070. https://doi.org/10.1142/S0217751X21500706
  10. Towards Detecting Patterns in Failure Logs of Large-Scale Distributed Systems. In 2015 IEEE International Parallel and Distributed Processing Symposium Workshop. 1052–1061. https://doi.org/10.1109/IPDPSW.2015.109
  11. LogMine: Fast Pattern Recognition for Log Analytics. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (Indianapolis, Indiana, USA) (CIKM ’16). Association for Computing Machinery, New York, NY, USA, 1573–1582. https://doi.org/10.1145/2983323.2983358
  12. Drain: An Online Log Parsing Approach with Fixed Depth Tree. In 2017 IEEE International Conference on Web Services (ICWS). 33–40. https://doi.org/10.1109/ICWS.2017.13
  13. Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics. https://doi.org/10.48550/ARXIV.2008.06448
  14. Automatic detection of multi-line templates in software log files. In 2017 Seventeenth International Conference on Advances in ICT for Emerging Regions (ICTer). 1–8. https://doi.org/10.1109/ICTER.2017.8257824
  15. Natural Event Summarization. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (Glasgow, Scotland, UK) (CIKM ’11). Association for Computing Machinery, New York, NY, USA, 765–774. https://doi.org/10.1145/2063576.2063688
  16. Divya Khyani and Siddhartha B S. 2021. An Interpretation of Lemmatization and Stemming in Natural Language Processing. Shanghai Ligong Daxue Xuebao/Journal of University of Shanghai for Science and Technology 22 (01 2021), 350–357.
  17. Jerry Kiernan and Evimaria Terzi. 2009. Constructing Comprehensive Summaries of Large Event Sequences. ACM Trans. Knowl. Discov. Data 3, 4, Article 21 (dec 2009), 31 pages. https://doi.org/10.1145/1631162.1631169
  18. System log clustering approaches for cyber security applications: A survey. Computers Security 92 (2020), 101739. https://doi.org/10.1016/j.cose.2020.101739
  19. System log clustering approaches for cyber security applications: A survey. Computers and Security 92 (2020), 101739. https://doi.org/10.1016/j.cose.2020.101739
  20. STARLORD: Linked security data exploration in a 3D graph. In 2017 IEEE Symposium on Visualization for Cyber Security (VizSec). 1–4. https://doi.org/10.1109/VIZSEC.2017.8062203
  21. Vladimir I. Levenshtein. 1965. Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics. Doklady 10 (1965), 707–710.
  22. FLAP: An End-to-End Event Log Analysis Platform for System Management. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Halifax, NS, Canada) (KDD ’17). Association for Computing Machinery, New York, NY, USA, 1547–1556. https://doi.org/10.1145/3097983.3098022
  23. SwissLog: Robust and Unified Deep Learning Based Log Anomaly Detection for Diverse Faults. In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). 92–103. https://doi.org/10.1109/ISSRE5003.2020.00018
  24. Clustering event logs using iterative partitioning. In Knowledge Discovery and Data Mining.
  25. LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization, 4739–4745. https://doi.org/10.24963/ijcai.2019/658
  26. A Search-Based Approach for Accurate Identification of Log Message Formats. In 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC). 167–16710.
  27. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. https://doi.org/10.48550/ARXIV.1908.10084
  28. Keiichi Shima. 2016. Length Matters: Clustering System Log Messages using Length of Words. https://doi.org/10.48550/ARXIV.1611.03213
  29. LogSig: Generating System Events from Raw Textual Logs. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (Glasgow, Scotland, UK) (CIKM ’11). Association for Computing Machinery, New York, NY, USA, 785–794. https://doi.org/10.1145/2063576.2063690
  30. Risto Vaarandi and Mauno Pihelgas. 2015. LogCluster - A data clustering and pattern mining algorithm for event logs. In 2015 11th International Conference on Network and Service Management (CNSM). 1–7. https://doi.org/10.1109/CNSM.2015.7367331
  31. Jonathan Webster and Chunyu Kit. 1992. Tokenization as the initial phase in NLP. 1106–1110. https://doi.org/10.3115/992424.992434
  32. PreFix: Switch Failure Prediction in Datacenter Networks. Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems (2018).
  33. Robust Log-Based Anomaly Detection on Unstable Log Data. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Tallinn, Estonia) (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 807–817. https://doi.org/10.1145/3338906.3338931

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com