HumanEval on Latest GPT Models -- 2024 (2402.14852v1)
Abstract: In 2023, we are using the latest models of GPT-4 to advance program synthesis. The LLMs have significantly improved the state-of-the-art for this purpose. To make these advancements more accessible, we have created a repository that connects these models to Huamn Eval. This dataset was initally developed to be used with a LLM called CODEGEN on natural and programming language data. The utility of these trained models is showcased by demonstrating their competitive performance in zero-shot Python code generation on HumanEval tasks compared to previous state-of-the-art solutions. Additionally, this gives way to developing more multi-step paradigm synthesis. This benchmark features 160 diverse problem sets factorized into multistep prompts that our analysis shows significantly improves program synthesis over single-turn inputs. All code is open source at https://github.com/daniel442li/gpt-human-eval .
- Hierarchical neural program synthesis. arXiv preprint arXiv:2303.06018, 2023.
- arXiv preprint arXiv:2303.03004, 2023.
- Steven Bryant. Assessing gpt-4’s role as a co-collaborator in scientific research: A case study analyzing einstein’s special theory of relativity. 04 2023.
- Evaluating large language models trained on code, 2021.
- Reasoning with language model is planning with world model, 2023.
- David C. S. Li and Weng-Fai Wong. Iaso: Enhancing syntax error correction using deep learning techniques. 07 2023.
- OpenAI. Gpt-4 technical report, 2023.
- Reflexion: Language agents with verbal reinforcement learning, 2023.
- Author Unknown. Enhancing robot program synthesis through environmental context. OpenReview, 2023. [Online]. Available: https://openreview.net/forum?id=pZ2Ww45GkL.
- Waqas Uzair. Six-tier architecture for ai-generated software development: A large language models approach. 06 2023.
- Ashish Vaswani et al. Attention is all you need. Conference Name or Journal, 2017. Introduction of the Transformer model.
- Can chatgpt write a good boolean query for systematic review literature search? 01 2023.
- Chain-of-thought prompting elicits reasoning in large language models, 2023.
- Tree of thoughts: Deliberate problem solving with large language models, 2023.
- Language agent tree search unifies reasoning acting and planning in language models, 2023.
- Daniel Li (42 papers)
- Lincoln Murr (3 papers)