Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages (2406.03636v4)

Published 5 Jun 2024 in cs.PL and cs.LG

Abstract: Recent advances in LLMs for code applications have demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair. Unsurprisingly, however, models struggle to compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Programming Languages (VLPLs). VLPLs appear in crucial settings, including domain-specific languages for internal tools, tool-chains for legacy languages, and formal verification frameworks. Inspired by a technique called natural programming elicitation, we propose designing an intermediate language that LLMs "naturally" know how to use and which can be automatically compiled to a target VLPL. When LLMs generate code that lies outside of this intermediate language, we use compiler techniques to repair the code into programs in the intermediate language. Overall, we introduce \emph{synthetic programming elicitation and compilation} (SPEAC), an approach that enables LLMs to generate syntactically valid code even for VLPLs. We empirically evaluate the performance of SPEAC in a case study for the UCLID5 formal verification language and find that, compared to existing retrieval and fine-tuning baselines, SPEAC produces syntactically correct programs more frequently and without sacrificing semantic correctness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Federico Mora (8 papers)
  2. Justin Wong (14 papers)
  3. Haley Lepe (1 paper)
  4. Sahil Bhatia (8 papers)
  5. Karim Elmaaroufi (5 papers)
  6. George Varghese (10 papers)
  7. Joseph E. Gonzalez (167 papers)
  8. Elizabeth Polgreen (20 papers)
  9. Sanjit A. Seshia (105 papers)
Citations (1)