Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis (2310.13669v1)
Abstract: The advent of large pre-trained LLMs in the domain of Code Synthesis has shown remarkable performance on various benchmarks, treating the problem of Code Generation in a fashion similar to Natural Language Generation, trained with a LLMling (LM) objective. In addition, the property of programming language code being precisely evaluable with respect to its semantics -- through the use of Unit Tests to check its functional correctness -- lends itself to using Reinforcement Learning (RL) as a further training paradigm. Previous work has shown that RL can be applied as such to improve models' coding capabilities; however, such RL-based methods rely on a reward signal based on defined Unit Tests, which are much harder to obtain compared to the huge crawled code datasets used in LM objectives. In this work, we present a novel approach to automatically obtain data consisting of function signatures and associated Unit Tests, suitable for RL training of Code Synthesis models. We also introduce a straightforward, simple yet effective Actor-Critic RL training scheme and show that it, in conjunction with automatically generated training data, leads to improvement of a pre-trained code LLM's performance by up to 9.9% improvement over the original underlying code synthesis LM, and up to 4.3% over RL-based models trained with standard PPO or CodeRL.
- A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR), 51(4):1–37.
- Program synthesis with large language models. arXiv preprint arXiv:2108.07732.
- Richard Bellman. 1957. A markovian decision process. Indiana Univ. Math. J., 6(4):679 – 684.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
- Codet: Code generation with generated tests. In The Eleventh International Conference on Learning Representations.
- Evaluating large language models trained on code. CoRR, abs/2107.03374.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
- Pangu-coder: Program synthesis with function-level language modeling.
- Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, volume 1, page 2.
- The Pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027.
- Matthew Hausknecht and Nolan Wagener. 2022. Consistent dropout for policy gradient reinforcement learning.
- Measuring coding challenge competence with apps. arXiv preprint arXiv:2105.09938.
- The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751.
- Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285.
- Pre-trained contextual embedding of source code.
- Interactive code generation via test-driven user-intent formalization. arXiv preprint arXiv:2208.05950.
- Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Advances in Neural Information Processing Systems, 35:21314–21328.
- Competition-level Code Generation with AlphaCode.
- Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664.
- Codexglue: A machine learning benchmark dataset for code understanding and generation. CoRR, abs/2102.04664.
- Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474.
- A conversational paradigm for program synthesis. arXiv preprint arXiv:2203.13474.
- Improving language understanding by generative pre-training. Preprint.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
- Unsupervised translation of programming languages. Advances in Neural Information Processing Systems, 33:20601–20611.
- A survey of evaluation metrics used for nlg systems. ACM Computing Surveys (CSUR), 55(2):1–39.
- High-dimensional continuous control using generalized advantage estimation.
- Proximal policy optimization algorithms. CoRR, abs/1707.06347.
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
- Structcoder: Structure-aware transformer for code generation. arXiv preprint arXiv:2206.05239.
- Attention is all you need. Advances in neural information processing systems, 30.
- Compilable neural code generation with compiler feedback. arXiv preprint arXiv:2203.05132.
- Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859.
- Fine-tuning language models from human preferences.
- Philip John Gorinski (12 papers)
- Matthieu Zimmer (17 papers)
- Gerasimos Lampouras (22 papers)
- Derrick Goh Xin Deik (4 papers)
- Ignacio Iacobacci (24 papers)