Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs (2404.08148v1)

Published 11 Apr 2024 in cs.CL

Abstract: Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of LLMs across various tasks. However, when tackling complex tasks that pose significant challenges for state-of-the-art models, this technique often struggles to produce effective chains of thought that lead to correct answers. In this work, we propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions. We apply our method to solving competitive-level programming challenges. More specifically, we employ an LLM to generate explanations for a set of <problem, solution-program> pairs, then use <problem, explanation> pairs to fine-tune a smaller LLM, which we refer to as the Reasoner, to learn algorithmic reasoning that can generate "how-to-solve" hints for unseen problems. Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder, resulting in higher solve rates than strong chain-of-thought baselines on competitive-level programming problems. It also outperforms models that learn directly from <problem, solution-program> pairs. We curated an additional test set in the CodeContests format, which includes 246 more recent problems posted after the models' knowledge cutoff.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Deepcoder: Learning to write programs. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=ByldLrqlx.
  2. Model compression. In Knowledge Discovery and Data Mining, 2006. URL https://api.semanticscholar.org/CorpusID:11253972.
  3. Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021. URL https://arxiv.org/abs/2107.03374.
  4. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks, 2022.
  5. Teaching large language models to self-debug, 2023.
  6. Deepseek-coder: When the large language model meets programming – the rise of code intelligence, 2024.
  7. Measuring coding challenge competence with apps, 2021.
  8. Distilling the knowledge in a neural network, 2015.
  9. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes, 2023.
  10. System 1 + system 2 = better world: Neural-symbolic chain of logic reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pp.  601–612, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.findings-emnlp.42.
  11. Competition-level problems are effective llm evaluators, 2023.
  12. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, 2022.
  13. Explaining competitive-level programming solutions using llms, 2023.
  14. Competition-level code generation with AlphaCode. Science, 378(6624):1092–1097, dec 2022. doi: 10.1126/science.abq1158. URL https://doi.org/10.1126%2Fscience.abq1158.
  15. Faithful chain-of-thought reasoning, 2023.
  16. Is self-repair a silver bullet for code generation?, 2023.
  17. OpenAI. ChatGPT: Optimizing Language Models for Dialogue. https://openai.com/blog/chatgpt, 2023a.
  18. OpenAI. Gpt-4 technical report, 2023b.
  19. Neural program search: Solving programming tasks from description and examples. CoRR, abs/1802.04335, 2018. URL http://arxiv.org/abs/1802.04335.
  20. Code generation with alphacodium: From prompt engineering to flow engineering, 2024.
  21. Alpaca: A strong, replicable instruction-following model. https://crfm.stanford.edu/blog.html, 2023. Accessed: date-of-access.
  22. Iteratively prompt pre-trained language models for chain of thought. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  2714–2730, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.174.
  23. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
  24. Effective distillation of table-based reasoning ability from llms, 2023.
  25. Learning to mine aligned code and natural language pairs from stack overflow, 2018.
  26. Mammoth: Building math generalist models through hybrid instruction tuning, 2023.
  27. Parsel: A (de-)compositional framework for algorithmic reasoning with language models, 2023.
  28. Algo: Synthesizing algorithmic programs with llm-generated oracle verifiers, 2023.
  29. Least-to-most prompting enables complex reasoning in large language models, 2023.
  30. Pad: Program-aided distillation specializes large models in reasoning, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Jierui Li (6 papers)
  2. Raymond Mooney (21 papers)