Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Explaining Competitive-Level Programming Solutions using LLMs (2307.05337v1)

Published 11 Jul 2023 in cs.CL

Abstract: In this paper, we approach competitive-level programming problem-solving as a composite task of reasoning and code generation. We propose a novel method to automatically annotate natural language explanations to \textit{<problem, solution>} pairs. We show that despite poor performance in solving competitive-level programming problems, state-of-the-art LLMs exhibit a strong capacity in describing and explaining solutions. Our explanation generation methodology can generate a structured solution explanation for the problem containing descriptions and analysis. To evaluate the quality of the annotated explanations, we examine their effectiveness in two aspects: 1) satisfying the human programming expert who authored the oracle solution, and 2) aiding LLMs in solving problems more effectively. The experimental results on the CodeContests dataset demonstrate that while LLM GPT3.5's and GPT-4's abilities in describing the solution are comparable, GPT-4 shows a better understanding of the key idea behind the solution.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Deepcoder: Learning to write programs. In International Conference on Learning Representations.
  2. Codet: Code generation with generated tests.
  3. Evaluating large language models trained on code. CoRR, abs/2107.03374.
  4. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks.
  5. Teaching large language models to self-debug.
  6. Wenyue Hua and Yongfeng Zhang. 2022. System 1 + system 2 = better world: Neural-symbolic chain of logic reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 601–612, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  7. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems.
  8. Antti Laaksonen. 2020. Guide to Competitive Programming - Learning and Improving Algorithms Through Contests, Second Edition. Undergraduate Topics in Computer Science. Springer.
  9. Comparing code explanations created by students and large language models. CoRR, abs/2304.03938.
  10. Solving quantitative reasoning problems with language models. CoRR, abs/2206.14858.
  11. Competition-level code generation with AlphaCode. Science, 378(6624):1092–1097.
  12. Faithful chain-of-thought reasoning.
  13. Experiences from using code explanations generated by large language models in a web software development e-book. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education, Volume 1, SIGCSE 2023, Toronto, ON, Canada, March 15-18, 2023, pages 931–937. ACM.
  14. Codeforces as an educational platform for learning programming in digitalization.
  15. Lever: Learning to verify language-to-code generation with execution.
  16. OpenAI. 2023a. ChatGPT: Optimizing Language Models for Dialogue. https://openai.com/blog/chatgpt.
  17. OpenAI. 2023b. Gpt-4 technical report.
  18. Training language models to follow instructions with human feedback. CoRR, abs/2203.02155.
  19. Illia Polosukhin and Alexander Skidanov. 2018. Neural program search: Solving programming tasks from description and examples. CoRR, abs/1802.04335.
  20. Adaptive test generation using a large language model.
  21. Execution-based code generation using deep reinforcement learning.
  22. Steven Skiena and Miguel A. Revilla. 2003. Programming challenges: the programming contest training manual. SIGACT News, 34:68–74.
  23. Iteratively prompt pre-trained language models for chain of thought. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2714–2730, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  24. Self-consistency improves chain of thought reasoning in language models. CoRR, abs/2203.11171.
  25. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
  26. Parsel: A (de-)compositional framework for algorithmic reasoning with language models.
  27. Star: Bootstrapping reasoning with reasoning. Advances in Neural Information Processing Systems, 35:15476–15488.
  28. Least-to-most prompting enables complex reasoning in large language models.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jierui Li (6 papers)
  2. Szymon Tworkowski (7 papers)
  3. Yingying Wu (26 papers)
  4. Raymond Mooney (21 papers)
Citations (13)