Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Xpert: Empowering Incident Management with Query Recommendations via Large Language Models (2312.11988v1)

Published 19 Dec 2023 in cs.SE, cs.AI, and cs.PL

Abstract: Large-scale cloud systems play a pivotal role in modern IT infrastructure. However, incidents occurring within these systems can lead to service disruptions and adversely affect user experience. To swiftly resolve such incidents, on-call engineers depend on crafting domain-specific language (DSL) queries to analyze telemetry data. However, writing these queries can be challenging and time-consuming. This paper presents a thorough empirical study on the utilization of queries of KQL, a DSL employed for incident management in a large-scale cloud management system at Microsoft. The findings obtained underscore the importance and viability of KQL queries recommendation to enhance incident management. Building upon these valuable insights, we introduce Xpert, an end-to-end machine learning framework that automates KQL recommendation process. By leveraging historical incident data and LLMs, Xpert generates customized KQL queries tailored to new incidents. Furthermore, Xpert incorporates a novel performance metric called Xcore, enabling a thorough evaluation of query quality from three comprehensive perspectives. We conduct extensive evaluations of Xpert, demonstrating its effectiveness in offline settings. Notably, we deploy Xpert in the real production environment of a large-scale incident management system in Microsoft, validating its efficiency in supporting incident management. To the best of our knowledge, this paper represents the first empirical study of its kind, and Xpert stands as a pioneering DSL query recommendation framework designed for incident management.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Cloud computing: issues and challenges. In 2010 24th IEEE international conference on advanced information networking and applications, pages 27–33. Ieee, 2010.
  2. Aegis: Attribution of control plane change impact across layers and components for cloud systems. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pages 222–233. IEEE, 2023.
  3. Towards incident handling in the cloud: challenges and approaches. In Proceedings of the 2010 ACM workshop on Cloud computing security workshop, pages 77–86, 2010.
  4. Assess and summarize: Improve outage understanding with large language models. arXiv preprint arXiv:2305.18084, 2023.
  5. Towards intelligent incident management: why we need it and how we make it. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1487–1497, 2020.
  6. Conan: Diagnosing batch failures for cloud systems. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pages 138–149. IEEE, 2023.
  7. Onion: identifying incident-indicating logs for cloud systems. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1253–1263, 2021.
  8. Microscope: mobile service traffic decomposition for network slicing as a service. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1–14, 2020.
  9. Robust multimodal failure detection for microservice systems. arXiv preprint arXiv:2305.18985, 2023.
  10. Deep learning in mobile and wireless networking: A survey. IEEE Communications surveys & tutorials, 21(3):2224–2287, 2019.
  11. Cloudlstm: A recurrent neural model for spatiotemporal point-cloud stream forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 10851–10858, 2021.
  12. Traceark: Towards actionable performance anomaly alerting for online service systems. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pages 258–269. IEEE, 2023.
  13. Root cause analysis for microservice systems via hierarchical reinforcement learning from human feedback. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5116–5125, 2023.
  14. Imdiffusion: Imputed diffusion models for multivariate time series anomaly detection. arXiv preprint arXiv:2307.00754, 2023.
  15. Autotsg: learning and synthesis for incident troubleshooting. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1477–1488, 2022.
  16. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, page 311–318, USA, 2002. Association for Computational Linguistics.
  17. An evaluation of log parsing with chatgpt. arXiv preprint arXiv:2306.01590, 2023.
  18. Le Xiao and Xiaolin Chen. Enhancing llm with evolutionary fine tuning for news summary generation. arXiv preprint arXiv:2307.02839, 2023.
  19. Everything of thoughts: Defying the law of penrose triangle for thought generation. arXiv preprint arXiv:2311.04254, 2023.
  20. Zerotop: Zero-shot task-oriented semantic parsing using large language models. arXiv preprint arXiv:2212.10815, 2022.
  21. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts, pages 1–7, 2022.
  22. A review on amazon web service (aws), microsoft azure & google cloud platform (gcp) services. In Proceedings of the 2nd International Conference on ICT for Digital, Smart, and Sustainable Development, ICIDSSD 2020, 27-28 February 2020, Jamia Hamdard, New Delhi, India, 2021.
  23. How to fight production incidents? an empirical study on a large-scale cloud service. In Proceedings of the 13th Symposium on Cloud Computing, pages 126–141, 2022.
  24. How long will it take to mitigate this incident for online service systems? In 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), pages 36–46. IEEE, 2021.
  25. Neural knowledge extraction from cloud service incidents. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pages 218–227. IEEE, 2021.
  26. System and incident management. AI-centric smart city ecosystems: technologies, design and implementation, page 21, 2022.
  27. Correlating events with time series for incident diagnosis. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1583–1592, 2014.
  28. Time-series anomaly detection service at microsoft. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 3009–3017, 2019.
  29. Fighting the fog of war: Automated incident detection for cloud systems. In 2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 131–146, 2021.
  30. Recommending root-cause and mitigation steps for cloud incidents using large language models. In ICSE 2023, 2023.
  31. When and how to develop domain-specific languages. ACM computing surveys (CSUR), 37(4):316–344, 2005.
  32. Chris J Date. A Guide to the SQL Standard. Addison-Wesley Longman Publishing Co., Inc., 1989.
  33. Semantics and complexity of graphql. In Proceedings of the 2018 World Wide Web Conference, pages 1155–1164, 2018.
  34. Working with prometheus query language (promql). Monitoring Microservices and Containerized Applications: Deployment, Configuration, and Best Practices for Prometheus and Alert Manager, pages 141–167, 2020.
  35. Deeptriage: Automated transfer assistance for incidents in cloud services. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3281–3289, 2020.
  36. Graph based incident extraction and diagnosis in large-scale online systems. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pages 1–13, 2022.
  37. Empowering practical root cause analysis by large language models for cloud incidents. arXiv preprint arXiv:2305.15778, 2023.
  38. OpenAI. GPT-4 technical report, 2023.
  39. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  40. Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–14, 2021.
  41. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  42. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048–11064, 2022.
  43. Niklas Muennighoff. Sgpt: GPT sentence embeddings for semantic search. arXiv preprint arXiv:2202.08904, 2022.
  44. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2019.
  45. Distance weighted cosine similarity measure for text classification. In Intelligent Data Engineering and Automated Learning–IDEAL 2013: 14th International Conference, IDEAL 2013, Hefei, China, October 20-23, 2013. Proceedings 14, pages 611–618. Springer, 2013.
  46. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
  47. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299, 2022.
  48. Understanding source code evolution using abstract syntax tree matching. In Proceedings of the 2005 international workshop on Mining software repositories, pages 1–5, 2005.
  49. Matt Post. A call for clarity in reporting bleu scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, page 186. Association for Computational Linguistics, 2018.
  50. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics.
  51. Codebleu: a method for automatic evaluation of code synthesis, 2020.
  52. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task, 2019.
  53. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation, 2023.
  54. Panagiotis Louridas. Static code analysis. Ieee Software, 23(4):58–61, 2006.
  55. Sqlnet: Generating structured queries from natural language without reinforcement learning, 2017.
  56. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, 2020.
  57. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  58. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708, 2021.
  59. Codet5+: Open code large language models for code understanding and generation. arXiv preprint arXiv:2305.07922, 2023.
  60. Code to comment” translation” data, metrics, baselining & evaluation. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pages 746–757, 2020.
  61. Multilingual training for software engineering. In Proceedings of the 44th International Conference on Software Engineering, pages 1443–1455, 2022.
  62. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  63. A survey on text-to-sql parsing: Concepts, methods, and future directions. arXiv preprint arXiv:2208.13629, 2022.
  64. A survey on deep learning approaches for text-to-sql. The VLDB Journal, pages 1–32, 2023.
  65. Translating math formula images to latex sequences using deep neural networks with sequence-level training. International Journal on Document Analysis and Recognition (IJDAR), 24(1-2):63–75, 2021.
  66. Knowledge graph and deep learning-based text-to-graphql model for intelligent medical consultation chatbot. Information Systems Frontiers, pages 1–20, 2022.
  67. PICARD: Parsing incrementally for constrained auto-regressive decoding from language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9895–9901, 2021.
  68. S2sql: Injecting syntax to question-schema interaction graph encoder for text-to-sql parsers. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1254–1262, 2022.
  69. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  70. A comprehensive evaluation of chatgpt’s zero-shot text-to-sql capability. arXiv preprint arXiv:2303.13547, 2023.
  71. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
Citations (13)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com