Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Function-constrained Program Synthesis (2311.15500v2)

Published 27 Nov 2023 in cs.LG, cs.CL, and cs.PL

Abstract: This work introduces (1) a technique that allows LLMs to leverage user-provided code when solving programming tasks and (2) a method to iteratively generate modular sub-functions that can aid future code generation attempts when the initial code generated by the LLM is inadequate. Generating computer programs in general-purpose programming languages like Python poses a challenge for LLMs when instructed to use code provided in the prompt. Code-specific LLMs (e.g., GitHub Copilot, CodeLlama2) can generate code completions in real-time by drawing on all code available in a development environment. However, restricting code-specific LLMs to use only in-context code is not straightforward, as the model is not explicitly instructed to use the user-provided code and users cannot highlight precisely which snippets of code the model should incorporate into its context. Moreover, current systems lack effective recovery methods, forcing users to iteratively re-prompt the model with modified prompts until a sufficient solution is reached. Our method differs from traditional LLM-powered code-generation by constraining code-generation to an explicit function set and enabling recovery from failed attempts through automatically generated sub-functions. When the LLM cannot produce working code, we generate modular sub-functions to aid subsequent attempts at generating functional code. A by-product of our method is a library of reusable sub-functions that can solve related tasks, imitating a software team where efficiency scales with experience. We also introduce a new "half-shot" evaluation paradigm that provides tighter estimates of LLMs' coding abilities compared to traditional zero-shot evaluation. Our proposed evaluation method encourages models to output solutions in a structured format, decreasing syntax errors that can be mistaken for poor coding ability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. J. Li, S. Tworkowski, Y. Wu, and R. Mooney, “Explaining competitive-level programming solutions using llms,” in Natural Language Reasoning and Structured Explanations (NLRSE) Workshop, associated with the 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023.
  2. S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lundberg et al., “Sparks of artificial general intelligence: Early experiments with gpt-4,” arXiv preprint arXiv:2303.12712, 2023.
  3. N. Dziri, X. Lu, M. Sclar, X. L. Li, L. Jiang, B. Y. Lin, P. West, C. Bhagavatula, R. L. Bras, J. D. Hwang, S. Sanyal, S. Welleck, X. Ren, A. Ettinger, Z. Harchaoui, and Y. Choi, “Faith and fate: Limits of transformers on compositionality,” in 37th Conference on Neural Information Processing Systems (NeurIPS), 2023.
  4. T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” arXiv preprint arXiv:2302.04761, 2023.
  5. C. Wu, S. Yin, W. Qi, X. Wang, Z. Tang, and N. Duan, “Visual chatgpt: Talking, drawing and editing with visual foundation models,” arXiv preprint arXiv:2303.04671, 2023.
  6. Y. Shen, K. Song, X. Tan, D. Li, W. Lu, and Y. Zhuang, “Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface,” arXiv preprint arXiv:2303.17580, 2023.
  7. Y. Liang, C. Wu, T. Song, W. Wu, Y. Xia, Y. Liu, Y. Ou, S. Lu, L. Ji, S. Mao et al., “Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis,” arXiv preprint arXiv:2303.16434, 2023.
  8. M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021.
  9. X. Chen, M. Lin, N. Schärli, and D. Zhou, “Teaching large language models to self-debug,” arXiv preprint: arXiv 2304.05128, 2023.
  10. L. Chen, M. Zaharia, and J. Zou, “How is chatgpt’s behavior changing over time?” arXiv preprint arXiv:2307.09009, 2023.
  11. K. Ellis, C. Wong, M. Nye, M. Sablé-Meyer, L. Morales, L. Hewitt, L. Cary, A. Solar-Lezama, and J. B. Tenenbaum, “Dreamcoder: Bootstrapping inductive program synthesis with wake-sleep library learning,” in Proceedings of the 42nd International Conference on Programming Language Design and Implementation, 2021.
  12. M. Bowers, T. X. Olausson, L. Wong, G. Grand, J. B. Tenenbaum, K. Ellis, and A. Solar-Lezama, “Top-down synthesis for library learning,” Proceedings of the ACM on Programming Languages, no. POPL, jan 2023.
  13. OpenAI, “Gpt-4 technical report,” arXiv preprint: arXiv 2303.08774, 2023.
  14. GitHub and OpenAI, “Github copilot,” 2021.
  15. Significant Gravitas, “AutoGPT,” accessed on: 2023-10-05. [Online]. Available: https://github.com/Significant-Gravitas/AutoGPT
  16. yoheinakajima, “babyagi,” 2023, accessed on: 2023-10-12. [Online]. Available: https://github.com/yoheinakajima/babyagi
  17. AntonOsika, “gpt-engineer,” 2023, accessed on: 2023-10-12. [Online]. Available: https://github.com/AntonOsika/gpt-engineer
  18. P. Gauthier, “Aider,” 2023, accessed on: 2023-10-13. [Online]. Available: https://github.com/paul-gauthier/aider
  19. X. Jiang, Y. Dong, L. Wang, Q. Shang, and G. Li, “Self-planning code generation with large language model,” arXiv preprint: arXiv 2303.06689, 2023.
  20. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, and et al., “Language models are few-shot learners,” in 33rd Conference on Neural Information Processing Systems (NeurIPS), 2020.
  21. Python Software Foundation, “Python standard library,” 2023.
  22. D. Hendrycks, S. Basart, S. Kadavath, M. Mazeika, A. Arora, E. Guo, C. Burns, S. Puranik, H. He, D. Song, and J. Steinhardt, “Measuring coding challenge competence with APPS,” in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021.
  23. S. Kulal, P. Pasupat, K. Chandra, M. Lee, O. Padon, A. Aiken, and P. Liang, “Spoc: Search-based pseudocode to code,” in 32nd Conference on Neural Information Processing Systems (NeurIPS), 2019.
  24. Anthropic, “Model card and evaluations for claude models,” 2023.
  25. N. Shinn, F. Cassano, B. Labash, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language agents with verbal reinforcement learning,” arXiv preprint: arXiv 2303.11366, 2023.
  26. E. Zelikman, Q. Huang, G. Poesia, N. D. Goodman, and N. Haber, “Parsel: A (de-)compositional framework for algorithmic reasoning with language models,” in 37th Conference on Neural Information Processing Systems (NeurIPS), 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Patrick Hajali (1 paper)
  2. Ignas Budvytis (26 papers)
Citations (1)