Finding Missed Code Size Optimizations in Compilers using LLMs (2501.00655v1)

Published 31 Dec 2024 in cs.SE, cs.LG, and cs.PL

Abstract: Compilers are complex, and significant effort has been expended on testing them. Techniques such as random program generation and differential testing have proved highly effective and have uncovered thousands of bugs in production compilers. The majority of effort has been expended on validating that a compiler produces correct code for a given input, while less attention has been paid to ensuring that the compiler produces performant code. In this work we adapt differential testing to the task of identifying missed optimization opportunities in compilers. We develop a novel testing approach which combines LLMs with a series of differential testing strategies and use them to find missing code size optimizations in C / C++ compilers. The advantage of our approach is its simplicity. We offload the complex task of generating random code to an off-the-shelf LLM, and use heuristics and analyses to identify anomalous compiler behavior. Our approach requires fewer than 150 lines of code to implement. This simplicity makes it extensible. By simply changing the target compiler and initial LLM prompt we port the approach from C / C++ to Rust and Swift, finding bugs in both. To date we have reported 24 confirmed bugs in production compilers, and conclude that LLM-assisted testing is a promising avenue for detecting optimization bugs in real world compilers.

PDF Abstract

Finding Missed Code Size Optimizations in Compilers using LLMs

This paper explores the novel application of LLMs for identifying missed code size optimizations in compilers. Traditionally, compiler testing largely focuses on functional correctness rather than optimization efficacy. Therefore, discovering inefficient code generation, particularly related to code size, is often overlooked. The authors introduce an LLM-driven methodology that effectively identifies such missed optimizations through a mutation-based testing approach.

The traditional methods of compiler testing and fuzzing, such as CSmith, necessitate complex, language-specific random program generators, which can be resource-intensive and difficult to maintain. Furthermore, these methods typically generate large and hard-to-interpret test cases. In contrast, the authors propose starting with a trivial seed program and leveraging an LLM to incrementally mutate this seed. By doing so, the approach circumvents the need for elaborate random program generators and simplifies the process considerably.

The paper articulates four differential testing strategies to identify compiler optimization shortcomings:

Dead Code Differential Testing: This strategy checks whether mutations classified as dead code alter the size of the compiled output. If the compiler generates different outputs for equivalent functionalities, a missed optimization is flagged.
Optimization Pipeline Differential Testing: This involves comparing code output sizes from different optimization levels (e.g., -O3 vs. -Oz). A larger size for a typically lower-size optimization pipeline suggests a potential inefficiency.
Single-Compiler Differential Testing: By comparing different versions of the same compiler, this strategy identifies regressions in code size optimizations that may have been introduced with changes in compiler versions.
Multi-Compiler Differential Testing: This approach evaluates the compilation outputs across different compilers. Significant discrepancies in code sizes hint at missed optimization opportunities in one of the compilers.

The methodology identified 24 bugs in production compilers across C/C++, Swift, and Rust, demonstrating its efficacy. For example, in evaluating GCC, a specific bug where value range analysis failed to optimize dead control structures was found and verified. Additionally, optimizations that inconsistently applied across different compiler versions or that varied from expected results in similar pipeline settings were noted.

A significant contribution of this work is its extensibility and application to multiple programming languages, signifying a broad impact beyond a single language or compiler system. The usage of LLMs facilitates easier adaptation to different languages by modifying test generation scripts, which the authors demonstrated with Rust and Swift.

While this paper focuses on code size optimizations, it opens the door for further research into runtime performance and other optimization aspects. The integration of enhanced heuristics, advancements in LLM capability, and potential improvements in LLM-engine coordination hold promising prospects for this research direction. Future work could explore enhanced prompt engineering, more sophisticated model interventions, and broader application scopes within the scope of compiler optimizations.

This research presents a significant step forward in compiler testing methodologies by applying AI-driven techniques to discover optimization inefficiencies, which historically have received minimal structured examination. The approach's minimal codebase requirement and effective bug discovery potential point to valuable real-world applications and a new direction in the growing intersection of AI and compiler technology.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Davide Italiano (2 papers)
Chris Cummins (23 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/matt_dz/status/1876399374869361125

https://twitter.com/rohanpaul_ai/status/1880904019889308033

https://twitter.com/calculito/status/1876598311064318425

Finding Missed Code Size Optimizations in Compilers using LLMs (2501.00655v1)