Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis (2310.13669v1)

Published 20 Oct 2023 in cs.LG, cs.AI, cs.CL, and cs.PL

Abstract: The advent of large pre-trained LLMs in the domain of Code Synthesis has shown remarkable performance on various benchmarks, treating the problem of Code Generation in a fashion similar to Natural Language Generation, trained with a LLMling (LM) objective. In addition, the property of programming language code being precisely evaluable with respect to its semantics -- through the use of Unit Tests to check its functional correctness -- lends itself to using Reinforcement Learning (RL) as a further training paradigm. Previous work has shown that RL can be applied as such to improve models' coding capabilities; however, such RL-based methods rely on a reward signal based on defined Unit Tests, which are much harder to obtain compared to the huge crawled code datasets used in LM objectives. In this work, we present a novel approach to automatically obtain data consisting of function signatures and associated Unit Tests, suitable for RL training of Code Synthesis models. We also introduce a straightforward, simple yet effective Actor-Critic RL training scheme and show that it, in conjunction with automatically generated training data, leads to improvement of a pre-trained code LLM's performance by up to 9.9% improvement over the original underlying code synthesis LM, and up to 4.3% over RL-based models trained with standard PPO or CodeRL.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (35)

Authors (5)

Philip John Gorinski (12 papers)
Matthieu Zimmer (17 papers)
Gerasimos Lampouras (22 papers)
Derrick Goh Xin Deik (4 papers)
Ignacio Iacobacci (24 papers)

Citations (3)

View on Semantic Scholar

Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis (2310.13669v1)

Related Papers