Papers
Topics
Authors
Recent
2000 character limit reached

First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning (2311.07945v3)

Published 14 Nov 2023 in cs.CL

Abstract: LLMs can solve complex reasoning tasks better by learning to generate rationales for their predictions. Often these models know how to solve a task but their auto-regressive decoding nature leads to incorrect results if they start incorrectly. We observe that smaller models in particular when corrected, can solve a task that they would have otherwise struggled with. We demonstrate this phenomenon by using a larger model to guide smaller models, which leads to significantly improved performance (up to +24 points on the GSM8K dataset by 7B models). To assist smaller models in initiating the starting step, we propose QuestCoT, where a smaller model first asks itself how to start, before proceeding with a chain of reasoning. On various multistep mathematical reasoning datasets over multiple smaller models, we show that getting the right start can lead to significant performance gains across all models (gains of up to +6 points on GSM8K, +9 on SVAMP, +5 on ASDiv, and +7 on MultiArith).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  2. Training verifiers to solve math word problems.
  3. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. ArXiv, abs/2305.02301.
  4. Large language models cannot self-correct reasoning yet. arXiv preprint arXiv:2310.01798.
  5. Mistral 7b. ArXiv, abs/2310.06825.
  6. Language models (mostly) know what they know.
  7. Large language models are zero-shot reasoners.
  8. Michal Kosinski. 2023. Theory of mind might have spontaneously emerged in large language models.
  9. Self-refine: Iterative refinement with self-feedback.
  10. Teaching small language models to reason.
  11. Paloma: A benchmark for evaluating language model fit. arXiv preprint arXiv:2312.10523.
  12. OpenAI. 2023. Gpt-4 technical report.
  13. Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2080–2094, Online. Association for Computational Linguistics.
  14. Can language models teach? teacher explanations improve student performance via personalization. In Thirty-seventh Conference on Neural Information Processing Systems.
  15. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300.
  16. Screws: A modular framework for reasoning with revisions.
  17. Automatic generation of socratic subquestions for teaching math word problems.
  18. Distilling reasoning capabilities into smaller language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7059–7073, Toronto, Canada. Association for Computational Linguistics.
  19. Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
  20. Self-consistency improves chain of thought reasoning in language models.
  21. Chain of thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
  22. Generating sequences by learning to self-correct.
  23. Large language models as optimizers.
  24. Metamath: Bootstrap your own mathematical questions for large language models. ArXiv, abs/2309.12284.
  25. Scaling relationship on learning mathematical reasoning with large language models. ArXiv, abs/2308.01825.
  26. Least-to-most prompting enables complex reasoning in large language models.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 8 likes.

Upgrade to Pro to view all of the tweets about this paper: