Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Coarse-Tuning Models of Code with Reinforcement Learning Feedback (2305.18341v2)

Published 25 May 2023 in cs.PL, cs.AI, and cs.LG

Abstract: LLMs pre-trained on code have recently emerged as the dominant approach to program synthesis. However, these models are trained using next-token prediction, which ignores the syntax and semantics of code. We propose RLCF, that further trains a pre-trained LLM via reinforcement learning, using feedback from a grounding function that scores the quality of the code. The grounding function uses (i) compiler-derived feedback on whether the code it generates passes a set of correctness checks; and (ii) feedback from a different LLM that compares the generated code to a reference code. RLCF is model- and language-agnostic. We empirically evaluate it on the MBJP and MathQA tasks for Java. Our experiments show that RLCF raises the odds that an LLM-generated program compiles, is executable, and produces the right output on tests, often allowing LLMs to match the performance of 2x-8x larger LLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Abhinav Jain (20 papers)
  2. Chima Adiole (1 paper)
  3. Swarat Chaudhuri (61 papers)
  4. Thomas Reps (40 papers)
  5. Chris Jermaine (24 papers)
Citations (1)