Improving Assembly Code Performance with Large Language Models via Reinforcement Learning (2505.11480v1)

Published 16 May 2025 in cs.CL, cs.AI, cs.PF, cs.PL, and cs.SE

Abstract: LLMs have demonstrated strong performance across a wide range of programming tasks, yet their potential for code optimization remains underexplored. This work investigates whether LLMs can optimize the performance of assembly code, where fine-grained control over execution enables improvements that are difficult to express in high-level languages. We present a reinforcement learning framework that trains LLMs using Proximal Policy Optimization (PPO), guided by a reward function that considers both functional correctness, validated through test cases, and execution performance relative to the industry-standard compiler gcc -O3. To support this study, we introduce a benchmark of 8,072 real-world programs. Our model, Qwen2.5-Coder-7B-PPO, achieves 96.0% test pass rates and an average speedup of 1.47x over the gcc -O3 baseline, outperforming all 20 other models evaluated, including Claude-3.7-sonnet. These results indicate that reinforcement learning can unlock the potential of LLMs to serve as effective optimizers for assembly code performance.

PDF Abstract

Improving Assembly Code Performance with LLMs via Reinforcement Learning

The paper "Improving Assembly Code Performance with LLMs via Reinforcement Learning" explores the application of LLMs in optimizing assembly code performance, utilizing reinforcement learning (RL). The investigation is driven by the potential for LLMs to enhance code efficiency in performance-critical domains, where fine-grained control at the assembly level can yield increments unattainable through high-level languages.

Context and Methodology

The traditional approach to assembly code optimization relies heavily on compilers that utilize a predetermined sequence of rule-based transformations to improve runtime efficiency. Nevertheless, these sequence-based methods are fundamentally limited by the complexity inherent in optimization tasks, where the phase-ordering problem requires careful balance and coordination of optimizations. This work tackles the challenge by employing reinforcement learning to train an LLM using Proximal Policy Optimization (PPO), a policy-gradient method known for its ability to stabilize training.

The research leverages a reward system focused on two pivotal attributes: functional correctness, via test case validation, and performance measures against the industry-standard gcc -O3 optimization level. A custom dataset of 8,072 real-world C programs with compiler-generated assembly forms the backbone of the training phase, providing a robust environment for optimization endeavors.

Key Findings

The LLM model, identified as Qwen2.5-Coder-7B-PPO, marks significant strides in performance when evaluated against a set of benchmarks:

Test pass rates were documented at 96.0%
Achieved an average speedup of 1.47× over the gcc -O3 baseline

These empirical results surpass the performance of 20 other models, including Claude-3.7-sonnet, emphasizing the effectiveness of reinforcement learning in optimizing low-level code.

Implications and Future Directions

The application of LLMs to assembly code optimization demonstrates promising implications for both software development and AI. Practically, enhancing assembly performance while maintaining correctness could lead to efficiency gains across domains where computational resources are critically constrained. Theoretically, the paper underscores the ability of LLMs to organically discover optimization strategies beyond traditional compilation techniques.

The research proposes future explorations into assembly code operations against other architectures (such as ARM or GPU programming) to assess the generalizability of optimization strategies. Further investigation into alternative reinforcement learning algorithms, and the development of interactive refinement methodologies, could unlock additional capabilities of LLMs in code optimization tasks.

Conclusion

This research extends the understanding of how reinforcement learning and LLMs can contribute to optimized code generation, especially within the domain of assembly programming. The demonstrated improvement in performance over traditional compiler outputs marks a meaningful milestone in AI-driven code optimization strategies. As the capabilities of LLMs continue to evolve, further exploration into direct code compilation, diverse architectures, and enhanced learning mechanisms will likely yield more sophisticated and widespread applications in computing and software engineering.