Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ALGO: Synthesizing Algorithmic Programs with LLM-Generated Oracle Verifiers (2305.14591v3)

Published 24 May 2023 in cs.CL and cs.SE

Abstract: LLMs excel at implementing code from functionality descriptions but struggle with algorithmic problems that require not only implementation but also identification of the suitable algorithm. Moreover, LLM-generated programs lack guaranteed correctness and require human verification. To address these challenges, we propose ALGO, a framework that synthesizes Algorithmic programs with LLM-Generated Oracles to guide the generation and verify their correctness. ALGO first generates a reference oracle by prompting an LLM to exhaustively enumerate all the combinations of relevant variables. This oracle is then utilized to guide an arbitrary search strategy in exploring the algorithm space and to verify the synthesized algorithms. Our study shows that the LLM-generated oracles are correct for 88% of the cases. With the oracles as verifiers, ALGO can be integrated with any existing code generation model in a model-agnostic manner to enhance its performance. Experiments show that when equipped with ALGO, we achieve an 8x better one-submission pass rate over the Codex model and a 2.6x better one-submission pass rate over CodeT, the current state-of-the-art model on CodeContests. We can also get 1.3x better pass rate over the ChatGPT Code Interpreter on unseen problems. The problem set we used for testing, the prompts we used, the verifier and solution programs, and the test cases generated by ALGO are available at https://github.com/zkx06111/ALGO.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
  2. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, March 2021. URL https://doi.org/10.5281/zenodo.5297715. If you use this software, please cite it using these metadata.
  3. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  4. Codet: Code generation with generated tests. In The Eleventh International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=ktrw68Cmu9c.
  5. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
  6. Type-directed synthesis of visualizations from natural language queries. Proceedings of the ACM on Programming Languages, 6(OOPSLA2):532–559, 2022.
  7. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128, 2023b.
  8. Introduction to algorithms. MIT press, 2022.
  9. Toga: a neural method for test oracle generation. In Proceedings of the 44th International Conference on Software Engineering, pp.  2130–2141, 2022.
  10. Obsynth: An interactive synthesis system for generating object models from natural language specifications. arXiv preprint arXiv:2210.11468, 2022.
  11. Measuring coding challenge competence with apps. NeurIPS, 2021.
  12. Large language models can self-improve. arXiv preprint arXiv:2210.11610, 2022.
  13. Fault-aware neural code rankers. Advances in Neural Information Processing Systems, 35:13419–13432, 2022.
  14. A theory of formal synthesis via inductive learning. Acta Informatica, 54:693–726, 2017.
  15. I speak, you verify: Toward trustworthy neural program synthesis. arXiv preprint arXiv:2210.00848, 2022.
  16. Spoc: Search-based pseudocode to code. Advances in Neural Information Processing Systems, 32, 2019.
  17. Competition-level code generation with alphacode. Science, 378(6624):1092–1097, 2022.
  18. Automatic test case and test oracle generation based on functional scenarios in formal specifications for conformance testing. IEEE Transactions on Software Engineering, 48(2):691–712, 2020.
  19. Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651, 2023.
  20. A deductive approach to program synthesis. ACM Transactions on Programming Languages and Systems (TOPLAS), 2(1):90–121, 1980.
  21. Reading between the lines: Modeling user behavior and costs in ai assisted programming. arXiv preprint arXiv:2210.14306, 2022.
  22. Lever: Learning to verify language-to-code generation with execution. In International Conference on Machine Learning, pp. 26106–26128. PMLR, 2023.
  23. Codegen: An open large language model for code with multi-turn program synthesis. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=iaYcJKpY2B_.
  24. Model-based test oracle generation for automated unit testing of agent systems. IEEE Transactions on Software Engineering, 39(9):1230–1244, 2013.
  25. Flashmeta: A framework for inductive program synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 107–126, 2015.
  26. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  27. Natural language to code translation with execution. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  3533–3546, 2022.
  28. From program verification to program synthesis. In Proceedings of the 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’10, pp.  313–326, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781605584799. doi: 10.1145/1706299.1706337. URL https://doi.org/10.1145/1706299.1706337.
  29. Template-based program verification and program synthesis. International Journal on Software Tools for Technology Transfer, 15, 04 2013. doi: 10.1007/s10009-012-0223-4.
  30. Learn from mistakes through cooperative interaction with study assistant. The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022, 2023.
  31. Planning with large language models for code generation. In The Eleventh International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=Lr8cOOtYbfL.
  32. Coder reviewer reranking for code generation. In International Conference on Machine Learning, pp. 41832–41846. PMLR, 2023b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Kexun Zhang (21 papers)
  2. Danqing Wang (37 papers)
  3. Jingtao Xia (3 papers)
  4. William Yang Wang (254 papers)
  5. Lei Li (1293 papers)
Citations (28)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub