Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis (2310.13669v1)

Published 20 Oct 2023 in cs.LG, cs.AI, cs.CL, and cs.PL

Abstract: The advent of large pre-trained LLMs in the domain of Code Synthesis has shown remarkable performance on various benchmarks, treating the problem of Code Generation in a fashion similar to Natural Language Generation, trained with a LLMling (LM) objective. In addition, the property of programming language code being precisely evaluable with respect to its semantics -- through the use of Unit Tests to check its functional correctness -- lends itself to using Reinforcement Learning (RL) as a further training paradigm. Previous work has shown that RL can be applied as such to improve models' coding capabilities; however, such RL-based methods rely on a reward signal based on defined Unit Tests, which are much harder to obtain compared to the huge crawled code datasets used in LM objectives. In this work, we present a novel approach to automatically obtain data consisting of function signatures and associated Unit Tests, suitable for RL training of Code Synthesis models. We also introduce a straightforward, simple yet effective Actor-Critic RL training scheme and show that it, in conjunction with automatically generated training data, leads to improvement of a pre-trained code LLM's performance by up to 9.9% improvement over the original underlying code synthesis LM, and up to 4.3% over RL-based models trained with standard PPO or CodeRL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR), 51(4):1–37.
  2. Program synthesis with large language models. arXiv preprint arXiv:2108.07732.
  3. Richard Bellman. 1957. A markovian decision process. Indiana Univ. Math. J., 6(4):679 – 684.
  4. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
  5. Codet: Code generation with generated tests. In The Eleventh International Conference on Learning Representations.
  6. Evaluating large language models trained on code. CoRR, abs/2107.03374.
  7. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  8. Pangu-coder: Program synthesis with function-level language modeling.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, volume 1, page 2.
  10. The Pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027.
  11. Matthew Hausknecht and Nolan Wagener. 2022. Consistent dropout for policy gradient reinforcement learning.
  12. Measuring coding challenge competence with apps. arXiv preprint arXiv:2105.09938.
  13. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751.
  14. Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285.
  15. Pre-trained contextual embedding of source code.
  16. Interactive code generation via test-driven user-intent formalization. arXiv preprint arXiv:2208.05950.
  17. Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Advances in Neural Information Processing Systems, 35:21314–21328.
  18. Competition-level Code Generation with AlphaCode.
  19. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664.
  20. Codexglue: A machine learning benchmark dataset for code understanding and generation. CoRR, abs/2102.04664.
  21. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474.
  22. A conversational paradigm for program synthesis. arXiv preprint arXiv:2203.13474.
  23. Improving language understanding by generative pre-training. Preprint.
  24. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  25. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  26. Unsupervised translation of programming languages. Advances in Neural Information Processing Systems, 33:20601–20611.
  27. A survey of evaluation metrics used for nlg systems. ACM Computing Surveys (CSUR), 55(2):1–39.
  28. High-dimensional continuous control using generalized advantage estimation.
  29. Proximal policy optimization algorithms. CoRR, abs/1707.06347.
  30. Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
  31. Structcoder: Structure-aware transformer for code generation. arXiv preprint arXiv:2206.05239.
  32. Attention is all you need. Advances in neural information processing systems, 30.
  33. Compilable neural code generation with compiler feedback. arXiv preprint arXiv:2203.05132.
  34. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859.
  35. Fine-tuning language models from human preferences.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Philip John Gorinski (12 papers)
  2. Matthieu Zimmer (17 papers)
  3. Gerasimos Lampouras (22 papers)
  4. Derrick Goh Xin Deik (4 papers)
  5. Ignacio Iacobacci (24 papers)
Citations (3)