LogSage: An LLM-Based Framework for CI/CD Failure Detection and Remediation with Industrial Validation (2506.03691v2)

Published 4 Jun 2025 in cs.SE

Abstract: Continuous Integration and Deployment (CI/CD) pipelines are critical to modern software engineering, yet diagnosing and resolving their failures remains complex and labor-intensive. We present LogSage, the first end-to-end LLM-powered framework for root cause analysis (RCA) and automated remediation of CI/CD failures. LogSage employs a token-efficient log preprocessing pipeline to filter noise and extract critical errors, then performs structured diagnostic prompting for accurate RCA. For solution generation, it leverages retrieval-augmented generation (RAG) to reuse historical fixes and invokes automation fixes via LLM tool-calling. On a newly curated benchmark of 367 GitHub CI/CD failures, LogSage achieves over 98\% precision, near-perfect recall, and an F1 improvement of more than 38\% points in the RCA stage, compared with recent LLM-based baselines. In a year-long industrial deployment at ByteDance, it processed over 1.07M executions, with end-to-end precision exceeding 80\%. These results demonstrate that LogSage provides a scalable and practical solution for automating CI/CD failure management in real-world DevOps workflows.

Summary

The paper presents an LLM-based framework (LogSage) that performs root cause analysis and automated remediation for CI/CD failures via token-efficient log processing.
It employs key log filtering, expansion, and token pruning alongside retrieval-augmented generation to achieve over 80% accuracy while reducing token usage by approximately 85%.
Industrial validation at ByteDance with over 1.07M CI/CD executions confirms LogSage's cost-efficiency, scalability, and robust performance in real-world settings.

LogSage Framework Overview

LogSage is an innovative LLM-based framework designed for root cause analysis (RCA) and automated remediation of CI/CD pipeline failures, validated through practical deployment in an industrial setting. It aims to address the ongoing challenges in CI/CD systems: detecting failures efficiently, identifying their root causes, and providing actionable solutions to developers. LogSage has demonstrated a robust capacity for integrating cutting-edge AI techniques into software engineering workflows, ensuring precise diagnostics and scalable solutions.

Figure 1: Overview of the LogSage framework, consisting of an offline preparation phase for log template deduplication and knowledge base construction, and an online operational phase for RCA and solution generation with execution.

Methodology

RCA Stage

The RCA stage of LogSage is equipped with a token-efficient log preprocessing pipeline designed to filter noise and extract critical errors. This stage initiates with Key Log Filtering, leveraging log templates from successful runs to isolate potential failure indicators without excessively aggressive exclusion, thereby avoiding missing essential context. The pipeline continues with Key Log Expansion, ensuring context by selecting surrounding lines before and after identified error lines. Finally, Token Overflow Pruning imposes constraints on the token count, emphasizing high-density blocks to prioritize information leading to RCA prompt generation.

Solution Generation

LogSage's solution generation employs a retrieval-augmented generation (RAG) mechanism, which accesses enterprise knowledge bases for historical fixes, enabling seamless synthesis into executable remediation strategies. The framework anticipates RCA reports and prompts the model to choose suitable tool-based fixes, which are then applied to restore failing CI/CD pipelines automatically.

Figure 2: Token usage for RCA across methods and LLMs.

Figure 3: Query rounds for RCA across methods and LLMs.

Experimental Evaluation

The LogSage framework was subjected to rigorous testing against several LLM-based baseline methods, including LogPrompt and LogGPT. The measures examined include precision, recall, and F1-score across multiple LLM platforms, with results indicating LogSage's superior performance. The framework significantly reduces token consumption, with empirical data revealing LogSage uses approximately 85% fewer tokens than competing methods while maintaining high accuracy, thus demonstrating both cost-efficiency and processing speed.

Industrial Validation

LogSage was established in real-world operations within ByteDance, facilitating over 1.07 million CI/CD executions while maintaining high user adoption rates and coverage. Its deployment highlighted the system's practical utility, delivering an accuracy rate exceeding 80% for RCA. End-users rated LogSage favorably, with substantial feedback underscoring its intelligence in suggesting effective solutions and automating remedial actions.

Figure 4: Weekly active user count and coverage rate.

Figure 5: Human-Annotated Online Accuracy.

Conclusion and Future Work

LogSage's introduction and success outline a path toward more autonomous and adaptive LLM-Agent systems capable of iteratively managing and preemptively resolving DevOps challenges. Future developments could include predictive failure modeling and enhanced integration with observability tools, aiming to create a more responsive and dynamic software engineering environment. The versatile scalability and integration readiness of LogSage make it a pivotal addition to CI/CD systems, pushing the boundaries of intelligent automation in software delivery processes.