Function-constrained Program Synthesis (2311.15500v2)
Abstract: This work introduces (1) a technique that allows LLMs to leverage user-provided code when solving programming tasks and (2) a method to iteratively generate modular sub-functions that can aid future code generation attempts when the initial code generated by the LLM is inadequate. Generating computer programs in general-purpose programming languages like Python poses a challenge for LLMs when instructed to use code provided in the prompt. Code-specific LLMs (e.g., GitHub Copilot, CodeLlama2) can generate code completions in real-time by drawing on all code available in a development environment. However, restricting code-specific LLMs to use only in-context code is not straightforward, as the model is not explicitly instructed to use the user-provided code and users cannot highlight precisely which snippets of code the model should incorporate into its context. Moreover, current systems lack effective recovery methods, forcing users to iteratively re-prompt the model with modified prompts until a sufficient solution is reached. Our method differs from traditional LLM-powered code-generation by constraining code-generation to an explicit function set and enabling recovery from failed attempts through automatically generated sub-functions. When the LLM cannot produce working code, we generate modular sub-functions to aid subsequent attempts at generating functional code. A by-product of our method is a library of reusable sub-functions that can solve related tasks, imitating a software team where efficiency scales with experience. We also introduce a new "half-shot" evaluation paradigm that provides tighter estimates of LLMs' coding abilities compared to traditional zero-shot evaluation. Our proposed evaluation method encourages models to output solutions in a structured format, decreasing syntax errors that can be mistaken for poor coding ability.
- J. Li, S. Tworkowski, Y. Wu, and R. Mooney, “Explaining competitive-level programming solutions using llms,” in Natural Language Reasoning and Structured Explanations (NLRSE) Workshop, associated with the 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023.
- S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lundberg et al., “Sparks of artificial general intelligence: Early experiments with gpt-4,” arXiv preprint arXiv:2303.12712, 2023.
- N. Dziri, X. Lu, M. Sclar, X. L. Li, L. Jiang, B. Y. Lin, P. West, C. Bhagavatula, R. L. Bras, J. D. Hwang, S. Sanyal, S. Welleck, X. Ren, A. Ettinger, Z. Harchaoui, and Y. Choi, “Faith and fate: Limits of transformers on compositionality,” in 37th Conference on Neural Information Processing Systems (NeurIPS), 2023.
- T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” arXiv preprint arXiv:2302.04761, 2023.
- C. Wu, S. Yin, W. Qi, X. Wang, Z. Tang, and N. Duan, “Visual chatgpt: Talking, drawing and editing with visual foundation models,” arXiv preprint arXiv:2303.04671, 2023.
- Y. Shen, K. Song, X. Tan, D. Li, W. Lu, and Y. Zhuang, “Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface,” arXiv preprint arXiv:2303.17580, 2023.
- Y. Liang, C. Wu, T. Song, W. Wu, Y. Xia, Y. Liu, Y. Ou, S. Lu, L. Ji, S. Mao et al., “Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis,” arXiv preprint arXiv:2303.16434, 2023.
- M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021.
- X. Chen, M. Lin, N. Schärli, and D. Zhou, “Teaching large language models to self-debug,” arXiv preprint: arXiv 2304.05128, 2023.
- L. Chen, M. Zaharia, and J. Zou, “How is chatgpt’s behavior changing over time?” arXiv preprint arXiv:2307.09009, 2023.
- K. Ellis, C. Wong, M. Nye, M. Sablé-Meyer, L. Morales, L. Hewitt, L. Cary, A. Solar-Lezama, and J. B. Tenenbaum, “Dreamcoder: Bootstrapping inductive program synthesis with wake-sleep library learning,” in Proceedings of the 42nd International Conference on Programming Language Design and Implementation, 2021.
- M. Bowers, T. X. Olausson, L. Wong, G. Grand, J. B. Tenenbaum, K. Ellis, and A. Solar-Lezama, “Top-down synthesis for library learning,” Proceedings of the ACM on Programming Languages, no. POPL, jan 2023.
- OpenAI, “Gpt-4 technical report,” arXiv preprint: arXiv 2303.08774, 2023.
- GitHub and OpenAI, “Github copilot,” 2021.
- Significant Gravitas, “AutoGPT,” accessed on: 2023-10-05. [Online]. Available: https://github.com/Significant-Gravitas/AutoGPT
- yoheinakajima, “babyagi,” 2023, accessed on: 2023-10-12. [Online]. Available: https://github.com/yoheinakajima/babyagi
- AntonOsika, “gpt-engineer,” 2023, accessed on: 2023-10-12. [Online]. Available: https://github.com/AntonOsika/gpt-engineer
- P. Gauthier, “Aider,” 2023, accessed on: 2023-10-13. [Online]. Available: https://github.com/paul-gauthier/aider
- X. Jiang, Y. Dong, L. Wang, Q. Shang, and G. Li, “Self-planning code generation with large language model,” arXiv preprint: arXiv 2303.06689, 2023.
- T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, and et al., “Language models are few-shot learners,” in 33rd Conference on Neural Information Processing Systems (NeurIPS), 2020.
- Python Software Foundation, “Python standard library,” 2023.
- D. Hendrycks, S. Basart, S. Kadavath, M. Mazeika, A. Arora, E. Guo, C. Burns, S. Puranik, H. He, D. Song, and J. Steinhardt, “Measuring coding challenge competence with APPS,” in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021.
- S. Kulal, P. Pasupat, K. Chandra, M. Lee, O. Padon, A. Aiken, and P. Liang, “Spoc: Search-based pseudocode to code,” in 32nd Conference on Neural Information Processing Systems (NeurIPS), 2019.
- Anthropic, “Model card and evaluations for claude models,” 2023.
- N. Shinn, F. Cassano, B. Labash, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language agents with verbal reinforcement learning,” arXiv preprint: arXiv 2303.11366, 2023.
- E. Zelikman, Q. Huang, G. Poesia, N. D. Goodman, and N. Haber, “Parsel: A (de-)compositional framework for algorithmic reasoning with language models,” in 37th Conference on Neural Information Processing Systems (NeurIPS), 2022.
- Patrick Hajali (1 paper)
- Ignas Budvytis (26 papers)