- The paper presents an LLM-based framework (LogSage) that performs root cause analysis and automated remediation for CI/CD failures via token-efficient log processing.
- It employs key log filtering, expansion, and token pruning alongside retrieval-augmented generation to achieve over 80% accuracy while reducing token usage by approximately 85%.
- Industrial validation at ByteDance with over 1.07M CI/CD executions confirms LogSage's cost-efficiency, scalability, and robust performance in real-world settings.
LogSage Framework Overview
LogSage is an innovative LLM-based framework designed for root cause analysis (RCA) and automated remediation of CI/CD pipeline failures, validated through practical deployment in an industrial setting. It aims to address the ongoing challenges in CI/CD systems: detecting failures efficiently, identifying their root causes, and providing actionable solutions to developers. LogSage has demonstrated a robust capacity for integrating cutting-edge AI techniques into software engineering workflows, ensuring precise diagnostics and scalable solutions.
Figure 1: Overview of the LogSage framework, consisting of an offline preparation phase for log template deduplication and knowledge base construction, and an online operational phase for RCA and solution generation with execution.
Methodology
RCA Stage
The RCA stage of LogSage is equipped with a token-efficient log preprocessing pipeline designed to filter noise and extract critical errors. This stage initiates with Key Log Filtering, leveraging log templates from successful runs to isolate potential failure indicators without excessively aggressive exclusion, thereby avoiding missing essential context. The pipeline continues with Key Log Expansion, ensuring context by selecting surrounding lines before and after identified error lines. Finally, Token Overflow Pruning imposes constraints on the token count, emphasizing high-density blocks to prioritize information leading to RCA prompt generation.
Solution Generation
LogSage's solution generation employs a retrieval-augmented generation (RAG) mechanism, which accesses enterprise knowledge bases for historical fixes, enabling seamless synthesis into executable remediation strategies. The framework anticipates RCA reports and prompts the model to choose suitable tool-based fixes, which are then applied to restore failing CI/CD pipelines automatically.
Figure 2: Token usage for RCA across methods and LLMs.
Figure 3: Query rounds for RCA across methods and LLMs.
Experimental Evaluation
The LogSage framework was subjected to rigorous testing against several LLM-based baseline methods, including LogPrompt and LogGPT. The measures examined include precision, recall, and F1-score across multiple LLM platforms, with results indicating LogSage's superior performance. The framework significantly reduces token consumption, with empirical data revealing LogSage uses approximately 85% fewer tokens than competing methods while maintaining high accuracy, thus demonstrating both cost-efficiency and processing speed.
Industrial Validation
LogSage was established in real-world operations within ByteDance, facilitating over 1.07 million CI/CD executions while maintaining high user adoption rates and coverage. Its deployment highlighted the system's practical utility, delivering an accuracy rate exceeding 80% for RCA. End-users rated LogSage favorably, with substantial feedback underscoring its intelligence in suggesting effective solutions and automating remedial actions.
Figure 4: Weekly active user count and coverage rate.
Figure 5: Human-Annotated Online Accuracy.
Conclusion and Future Work
LogSage's introduction and success outline a path toward more autonomous and adaptive LLM-Agent systems capable of iteratively managing and preemptively resolving DevOps challenges. Future developments could include predictive failure modeling and enhanced integration with observability tools, aiming to create a more responsive and dynamic software engineering environment. The versatile scalability and integration readiness of LogSage make it a pivotal addition to CI/CD systems, pushing the boundaries of intelligent automation in software delivery processes.