ReGAL: Refactoring Programs to Discover Generalizable Abstractions (2401.16467v2)

Published 29 Jan 2024 in cs.SE, cs.AI, cs.CL, cs.LG, and cs.PL

Abstract: While LLMs are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality. Generating redundant code from scratch is both inefficient and error-prone. To address this, we propose Refactoring for Generalizable Abstraction Learning (ReGAL), a gradient-free method for learning a library of reusable functions via code refactorization, i.e., restructuring code without changing its execution output. ReGAL learns from a small set of existing programs, iteratively verifying and refining its abstractions via execution. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. On five datasets -- LOGO graphics generation, Date reasoning, TextCraft (a Minecraft-based text-game) MATH, and TabMWP -- both open-source and proprietary LLMs improve in accuracy when predicting programs with ReGAL functions. For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains. Our analysis reveals ReGAL's abstractions encapsulate frequently-used subroutines as well as environment dynamics.

PDF HTML Abstract

Methodology

The advent of LLMs in program synthesis has enabled a host of complex tasks to be automated through code generation. However, a critical limitation is the LLMs' inability to leverage a global understanding that would enable them to create reusable code abstractions. Essentially, LLMs tend to produce redundant and non-reusable code for each task independently, which is not only less efficient but also error-prone. The Refactoring for Generalizable Abstraction Learning (ReGAL) approach is designed to overcome this limitation by refactoring programs into a library of reusable functions verified through execution.

ReGAL works by learning from a set of existing programs, refining them iteratively. This is accomplished without the use of gradients, instead relying on the execution feedback to verify and refine its suggestions. Crucially, the abstractions learned through ReGAL yield significant improvements in code prediction accuracy across various LLMs and datasets.

Empirical Results

When deployed, ReGAL showed an impressive impact on the efficiency and accuracy of code synthesis. Specifically, using the CodeLlama-13B model, it achieved accuracy increases of 11.5% on LOGO graphics, 26.1% on date understanding, and 8.1% on TextCraft. It is noteworthy that these improvements outpaced larger models such as GPT-3.5 in two of the three tested domains. The results underscore ReGAL's potential to generalize across various functions and applications.

Comparative Analysis

Looking at ReGAL relative to existing methods, it presents a unique approach by relying exclusively on an LLM for both refactoring and program prediction, in contrast to earlier works where symbolic search was more common. ReGAL's gradient-free training paradigm allows it to use more common languages like Python and learn from LLMs-generated programs without requiring human annotations, which contrasts with other systems that depend on extensive human inputs.

Conclusion

In conclusion, ReGAL positions itself as a notable advancement in the domain of LLMs and program synthesis. It demonstrates a clear capability to refactor existing code into reusable and generalizable abstractions, thereby streamlining the program prediction process and significantly enhancing accuracy. The tool's applicability to varied datasets reinforces its adaptability and reaffirms the value of developing shared code libraries for task execution. Going forward, ReGAL might well redefine the standards of efficiency and reliability in automated program synthesis.