Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 71 tok/s
Gemini 2.5 Pro 58 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 236 tok/s Pro
GPT OSS 120B 469 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Mitigating Misleading Chain-of-Thought Reasoning with Selective Filtering (2403.19167v1)

Published 28 Mar 2024 in cs.CL and cs.AI

Abstract: LLMs have manifested remarkable capabilities by leveraging chain-of-thought (CoT) reasoning techniques to solve intricate questions through step-by-step reasoning chains. Despite its success, the efficacy of such reasoning is inherently contingent upon the quality of CoT. However, flawless CoT reasoning cannot be guaranteed due to the presence of indecomposable questions and the potential for erroneous reasoning chains, particularly in the case of small-scale LLMs. To tackle this challenge, we propose a novel approach called the selective filtering reasoner (SelF-Reasoner) that assesses the entailment relationship between the question and the candidate reasoning chain. Then, we proceed with CoT reasoning when the reasoning chain demonstrates confidence; otherwise, we opt to predict the answer directly. SelF-Reasoner improves the fine-tuned T5 baseline consistently over the ScienceQA, ECQA, and LastLetter tasks. Code is available at \texttt{https://github.com/LibroWu/SelF-Reasoner}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Explanations for CommonsenseQA: New Dataset and Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3050–3065, Online. Association for Computational Linguistics.
  2. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  3. Palm: Scaling language modeling with pathways.
  4. Training verifiers to solve math word problems.
  5. Active prompting with chain-of-thought for large language models. arXiv preprint arXiv:2302.12246.
  6. An image is worth 16x16 words: Transformers for image recognition at scale.
  7. Large language models are reasoning teachers.
  8. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes.
  9. Unifiedqa: Crossing format boundaries with a single QA system. CoRR, abs/2005.00700.
  10. Large language models are zero-shot reasoners. In Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022).
  11. Explanations from large language models make small reasoners better.
  12. Let’s verify step by step. arXiv preprint arXiv:2305.20050.
  13. Learn to explain: Multimodal reasoning via thought chains for science question answering. arXiv preprint arXiv:2209.09513.
  14. Chameleon: Plug-and-play compositional reasoning with large language models.
  15. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086–8098.
  16. Teaching small language models to reason.
  17. Show your work: Scratchpads for intermediate computation with language models. In Deep Learning for Code Workshop.
  18. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, page 311–318, USA. Association for Computational Linguistics.
  19. Language models are unsupervised multitask learners.
  20. Scaling language models: Methods, analysis & insights from training gopher.
  21. Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR, abs/1910.10683.
  22. Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR, 21(140):1–67.
  23. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  24. Conceptnet 5.5: An open multilingual graph of general knowledge.
  25. Enhancing chain-of-thoughts prompting with iterative bootstrapping in large language models. arXiv preprint arXiv:2304.11657.
  26. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota. Association for Computational Linguistics.
  27. Lamda: Language models for dialog applications.
  28. Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. arXiv preprint arXiv:2305.04388.
  29. Iteratively prompt pre-trained language models for chain of thought. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2714–2730.
  30. Towards understanding chain-of-thought prompting: An empirical study of what matters.
  31. Pinto: Faithful language reasoning using prompt-generated rationales.
  32. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
  33. Chain of thought prompting elicits reasoning in large language models. In Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022).
  34. Automatic chain of thought prompting in large language models.
  35. Multimodal chain-of-thought reasoning in language models. arXiv preprint arXiv:2302.00923.
  36. Verify-and-edit: A knowledge-enhanced chain-of-thought framework. arXiv preprint arXiv:2305.03268.
  37. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.