Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Energy-Based Models for Code Generation under Compilability Constraints (2106.04985v1)

Published 9 Jun 2021 in cs.LG, cs.CL, cs.NE, and cs.SE

Abstract: Neural LLMs can be successfully trained on source code, leading to applications such as code completion. However, their versatile autoregressive self-supervision objective overlooks important global sequence-level features that are present in the data such as syntactic correctness or compilability. In this work, we pose the problem of learning to generate compilable code as constraint satisfaction. We define an Energy-Based Model (EBM) representing a pre-trained generative model with an imposed constraint of generating only compilable sequences. We then use the KL-Adaptive Distributional Policy Gradient algorithm (Khalifa et al., 2021) to train a generative model approximating the EBM. We conduct experiments showing that our proposed approach is able to improve compilability rates without sacrificing diversity and complexity of the generated samples.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Tomasz Korbak (24 papers)
  2. Hady Elsahar (21 papers)
  3. Marc Dymetman (21 papers)
  4. Germán Kruszewski (22 papers)
Citations (12)