Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging Reinforcement Learning and Large Language Models for Code Optimization (2312.05657v1)

Published 9 Dec 2023 in cs.LG, cs.AI, cs.PL, and cs.SE

Abstract: Code optimization is a daunting task that requires a significant level of expertise from experienced programmers. This level of expertise is not sufficient when compared to the rapid development of new hardware architectures. Towards advancing the whole code optimization process, recent approaches rely on machine learning and artificial intelligence techniques. This paper introduces a new framework to decrease the complexity of code optimization. The proposed framework builds on LLMs and reinforcement learning (RL) and enables LLMs to receive feedback from their environment (i.e., unit tests) during the fine-tuning process. We compare our framework with existing state-of-the-art models and show that it is more efficient with respect to speed and computational usage, as a result of the decrement in training steps and its applicability to models with fewer parameters. Additionally, our framework reduces the possibility of logical and syntactical errors. Toward evaluating our approach, we run several experiments on the PIE dataset using a CodeT5 LLM and RRHF, a new reinforcement learning algorithm. We adopt a variety of evaluation metrics with regards to optimization quality, and speedup. The evaluation results demonstrate that the proposed framework has similar results in comparison with existing models using shorter training times and smaller pre-trained models. In particular, we accomplish an increase of 5.6% and 2.2 over the baseline models concerning the %OP T and SP metrics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022.
  2. Learning to superoptimize programs. CoRR, abs/1611.01787, 2016. URL http://arxiv.org/abs/1611.01787.
  3. Evaluating large language models trained on code, 2021.
  4. Programl: Graph-based deep learning for program optimization and analysis, 2020.
  5. The three pillars of machine programming. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, pp.  69–80, 2018a.
  6. The three pillars of machine programming. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL 2018, pp.  69–80, New York, NY, USA, 2018b. Association for Computing Machinery. ISBN 9781450358347. doi: 10.1145/3211346.3211355. URL https://doi.org/10.1145/3211346.3211355.
  7. Measuring coding challenge competence with apps. NeurIPS, 2021.
  8. CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436, 2019.
  9. CodeRL: Mastering code generation through pretrained models and deep reinforcement learning. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=WaGvb7OzySA.
  10. Rltf: Reinforcement learning from unit test feedback, 2023.
  11. Codexglue: A machine learning benchmark dataset for code understanding and generation. CoRR, abs/2102.04664, 2021.
  12. Learning performance-improving code edits. arXiv preprint arXiv:2302.07867, 2023.
  13. Codegen2: Lessons for training llms on programming and natural languages. ICLR, 2023a.
  14. Codegen: An open large language model for code with multi-turn program synthesis. ICLR, 2023b.
  15. Codenet: A large-scale ai for code dataset for learning a diversity of coding tasks, 2021.
  16. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. URL http://jmlr.org/papers/v21/20-074.html.
  17. Proximal policy optimization algorithms, 2017.
  18. Pangu-coder2: Boosting large language models for code with ranking feedback, 2023.
  19. Execution-based code generation using deep reinforcement learning, 2023.
  20. Preference ranking optimization for human alignment, 2023.
  21. Learning to summarize from human feedback. CoRR, abs/2009.01325, 2020. URL https://arxiv.org/abs/2009.01325.
  22. Attention is all you need. CoRR, abs/1706.03762, 2017. URL http://arxiv.org/abs/1706.03762.
  23. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  8696–8708, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.685. URL https://aclanthology.org/2021.emnlp-main.685.
  24. Reinforcement learning from diverse human preferences, 2023.
  25. Rrhf: Rank responses to align language models with human feedback without tears, 2023.
  26. Siren’s song in the ai ocean: A survey on hallucination in large language models, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Shukai Duan (11 papers)
  2. Nikos Kanakaris (9 papers)
  3. Xiongye Xiao (16 papers)
  4. Heng Ping (9 papers)
  5. Chenyu Zhou (15 papers)
  6. Nesreen K. Ahmed (76 papers)
  7. Guixiang Ma (20 papers)
  8. Mihai Capota (9 papers)
  9. Theodore L. Willke (21 papers)
  10. Shahin Nazarian (31 papers)
  11. Paul Bogdan (51 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com