Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ReGAL: Refactoring Programs to Discover Generalizable Abstractions (2401.16467v2)

Published 29 Jan 2024 in cs.SE, cs.AI, cs.CL, cs.LG, and cs.PL

Abstract: While LLMs are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality. Generating redundant code from scratch is both inefficient and error-prone. To address this, we propose Refactoring for Generalizable Abstraction Learning (ReGAL), a gradient-free method for learning a library of reusable functions via code refactorization, i.e., restructuring code without changing its execution output. ReGAL learns from a small set of existing programs, iteratively verifying and refining its abstractions via execution. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. On five datasets -- LOGO graphics generation, Date reasoning, TextCraft (a Minecraft-based text-game) MATH, and TabMWP -- both open-source and proprietary LLMs improve in accuracy when predicting programs with ReGAL functions. For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains. Our analysis reveals ReGAL's abstractions encapsulate frequently-used subroutines as well as environment dynamics.

Methodology

The advent of LLMs in program synthesis has enabled a host of complex tasks to be automated through code generation. However, a critical limitation is the LLMs' inability to leverage a global understanding that would enable them to create reusable code abstractions. Essentially, LLMs tend to produce redundant and non-reusable code for each task independently, which is not only less efficient but also error-prone. The Refactoring for Generalizable Abstraction Learning (ReGAL) approach is designed to overcome this limitation by refactoring programs into a library of reusable functions verified through execution.

ReGAL works by learning from a set of existing programs, refining them iteratively. This is accomplished without the use of gradients, instead relying on the execution feedback to verify and refine its suggestions. Crucially, the abstractions learned through ReGAL yield significant improvements in code prediction accuracy across various LLMs and datasets.

Empirical Results

When deployed, ReGAL showed an impressive impact on the efficiency and accuracy of code synthesis. Specifically, using the CodeLlama-13B model, it achieved accuracy increases of 11.5% on LOGO graphics, 26.1% on date understanding, and 8.1% on TextCraft. It is noteworthy that these improvements outpaced larger models such as GPT-3.5 in two of the three tested domains. The results underscore ReGAL's potential to generalize across various functions and applications.

Comparative Analysis

Looking at ReGAL relative to existing methods, it presents a unique approach by relying exclusively on an LLM for both refactoring and program prediction, in contrast to earlier works where symbolic search was more common. ReGAL's gradient-free training paradigm allows it to use more common languages like Python and learn from LLMs-generated programs without requiring human annotations, which contrasts with other systems that depend on extensive human inputs.

Conclusion

In conclusion, ReGAL positions itself as a notable advancement in the domain of LLMs and program synthesis. It demonstrates a clear capability to refactor existing code into reusable and generalizable abstractions, thereby streamlining the program prediction process and significantly enhancing accuracy. The tool's applicability to varied datasets reinforces its adaptability and reaffirms the value of developing shared code libraries for task execution. Going forward, ReGAL might well redefine the standards of efficiency and reliability in automated program synthesis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Turtle geometry: The computer as a medium for exploring mathematics. MIT press, 1986.
  2. Do as i can and not as i say: Grounding language in robotic affordances. In arXiv preprint arXiv:2204.01691, 2022.
  3. Neural module networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  39–48, 2016.
  4. Recent advances in hierarchical reinforcement learning. Discrete event dynamic systems, 13(1-2):41–77, 2003.
  5. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pp.  41–48, 2009.
  6. Leveraging code to improve in-context learning for semantic parsing. arXiv preprint arXiv:2311.09519, 2023.
  7. Top-down synthesis for library learning. Proceedings of the ACM on Programming Languages, 7(POPL):1182–1213, 2023.
  8. Large language models as tool makers. arXiv preprint arXiv:2305.17126, 2023.
  9. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588, 2022.
  10. Visual programming for text-to-image generation and evaluation. Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023.
  11. Textworld: A learning environment for text-based games. In Computer Games: 7th Workshop, CGW 2018, Held in Conjunction with the 27th International Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden, July 13, 2018, Revised Selected Papers 7, pp.  41–75. Springer, 2019.
  12. Downey, A. Think python. ” O’Reilly Media, Inc.”, 2012.
  13. Faith and fate: Limits of transformers on compositionality. arXiv preprint arXiv:2305.18654, 2023.
  14. Dreamcoder: Bootstrapping inductive program synthesis with wake-sleep library learning. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, pp.  835–850, 2021.
  15. Applications and implementation: an experimental comparison of cellular (group technology) layout with process layout. Decision Sciences, 18(4):562–581, 1987.
  16. Foundations of rule learning. Springer Science & Business Media, 2012.
  17. The evolution of ecological specialization. Annual review of Ecology and Systematics, 19(1):207–233, 1988.
  18. Learning interpretable libraries by compressing and documenting code. In Intrinsically-Motivated and Open-Ended Learning Workshop@ NeurIPS2023, 2023.
  19. Semantic parsing for task oriented dialog using hierarchical representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.  2787–2792, 2018.
  20. Visual programming: Compositional visual reasoning without training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14953–14962, 2023.
  21. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. arXiv preprint arXiv:2201.07207, 2022.
  22. Voxposer: Composable 3d value maps for robotic manipulation with language models. In Conference on Robot Learning, pp.  540–562. PMLR, 2023.
  23. Decomposed prompting: A modular approach for solving complex tasks. In The Eleventh International Conference on Learning Representations, 2023.
  24. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
  25. What makes good in-context examples for GPT-3? In Agirre, E., Apidianaki, M., and Vulić, I. (eds.), Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pp.  100–114, Dublin, Ireland and Online, May 2022. Association for Computational Linguistics.
  26. Agentbench: Evaluating llms as agents. arXiv preprint arXiv:2308.03688, 2023.
  27. Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842, 2023.
  28. Faithful chain-of-thought reasoning. arXiv preprint arXiv:2301.13379, 2023.
  29. Clin: A continually learning language agent for rapid task adaptation and generalization. arXiv preprint arXiv:2310.10134, 2023.
  30. Rule learning by seven-month-old infants. Science, 283(5398):77–80, 1999.
  31. McConnell, S. Code complete. Pearson Education, 2004.
  32. Augmented language models: a survey. arXiv preprint arXiv:2302.07842, 2023.
  33. O’Donnell, T. J. Productivity and reuse in language: A theory of linguistic computation and storage. MIT Press, 2015.
  34. OpenAI. New and improved embedding model, 2022. URL https://openai.com/blog/new-and-improved-embedding-model.
  35. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  36. Adapt: As-needed decomposition and planning with language models. arXiv preprint arXiv:2311.05772, 2023.
  37. Creator: Tool creation for disentangling abstract and concrete reasoning of large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pp.  6922–6939, 2023.
  38. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789, 2023.
  39. Benchclamp: A benchmark for evaluating language models on semantic parsing. arXiv preprint arXiv:2206.10668, 2022.
  40. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023.
  41. Identifying the risks of LM agents with an LM-emulated sandbox. In NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023.
  42. Summarization programs: Interpretable abstractive summarization with neural modular trees. In The Eleventh International Conference on Learning Representations, 2022.
  43. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  44. Few-shot semantic parsing with language models trained on code. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  5417–5425, 2022.
  45. ProgPrompt: Generating situated robot task plans using Large Language Models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp.  11523–11530. IEEE, 2023.
  46. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
  47. ViperGPT: Visual inference via Python execution for reasoning. arXiv preprint arXiv:2303.08128, 2023.
  48. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
  49. Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261, 2022.
  50. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
  51. Ward Jr, J. H. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963.
  52. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.
  53. Winograd, T. Understanding natural language. Cognitive psychology, 3(1):1–191, 1972.
  54. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp.  38–45, Online, October 2020. Association for Computational Linguistics.
  55. Leveraging language to learn program abstractions and search heuristics. In International Conference on Machine Learning, pp.  11193–11204. PMLR, 2021.
  56. Learning adaptive planning representations with natural language guidance. arXiv preprint arXiv:2312.08566, 2023.
  57. Yang, C. The price of linguistic productivity: How children learn to break the rules of language. MIT press, 2016.
  58. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, 2023.
  59. Large language models as analogical reasoners. arXiv preprint arXiv:2310.01714, 2023.
  60. Craft: Customizing llms by creating and retrieving from specialized toolsets. arXiv preprint arXiv:2309.17428, 2023.
  61. Learning to parse database queries using inductive logic programming. In Proceedings of the national conference on artificial intelligence, pp.  1050–1055, 1996.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Elias Stengel-Eskin (49 papers)
  2. Archiki Prasad (18 papers)
  3. Mohit Bansal (304 papers)
Citations (11)
Github Logo Streamline Icon: https://streamlinehq.com