Emergent Mind

Divide and Conquer for Large Language Models Reasoning

(2401.05190)
Published Jan 10, 2024 in cs.CL

Abstract

Large language models (LLMs) have shown impressive performance in various reasoning benchmarks with the emergence of Chain-of-Thought (CoT) and its derivative methods, particularly in tasks involving multi-choice questions (MCQs). However, current works all process data uniformly without considering the problem-solving difficulty, which means an excessive focus on simple questions while insufficient to intricate ones. To address this challenge, we inspired by humans using heuristic strategies to categorize tasks and handle them individually, propose to apply the Divide and Conquer to LLMs reasoning. First, we divide questions into different subsets based on the statistical confidence score ($\mathcal{CS}$), then fix nearly resolved sets and conquer demanding nuanced process ones with elaborately designed methods, including Prior Knowledge based Reasoning (PKR) and Filter Choices based Reasoning (FCR), as well as their integration variants. Our experiments demonstrate that this proposed strategy significantly boosts the models' reasoning abilities across nine datasets involving arithmetic, commonsense, and logic tasks. For instance, compared to baseline, we make a striking improvement on low confidence subsets of 8.72\% for AQuA, 15.07\% for ARC Challenge and 7.71\% for RiddleSense. In addition, through extensive analysis on length of rationale and number of options, we verify that longer reasoning paths in PKR could prevent models from referring infer-harmful shortcuts, and also find that removing irrelevant choices in FCR would substantially avoid models' confusion. The code is at \url{https://github.com/AiMijie/Divide-and-Conquer}

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Please try again later (sorry!).

Get summaries of trending AI papers delivered straight to your inbox

Unsubscribe anytime.

References
  1. PaLM 2 Technical Report
  2. Jon Louis Bentley. Multidimensional divide-and-conquer. Communications of the ACM, 23(4):214–229
  3. Divide-and-conquer in multidimensional space. In Proceedings of the eighth annual ACM symposium on Theory of computing, pp.  220–230
  4. Graph of Thoughts: Solving Elaborate Problems with Large Language Models
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901
  6. Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
  7. ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models
  8. PaLM: Scaling Language Modeling with Pathways
  9. Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
  10. Training Verifiers to Solve Math Word Problems
  11. Agent Instructs Large Language Models to be General Zero-Shot Reasoners
  12. Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems
  13. Active Prompting with Chain-of-Thought for Large Language Models
  14. Michael Eisenstein. Divide and conquer. Nature, 441(7097):1179–1179
  15. Complexity-Based Prompting for Multi-Step Reasoning
  16. Pal: Program-aided language models. In International Conference on Machine Learning, pp.  10764–10799. PMLR
  17. Gauss and the history of the fast fourier transform. IEEE Assp Magazine, 1(4):14–21
  18. Measuring Massive Multitask Language Understanding
  19. Code prompting: a neural symbolic method for complex reasoning in large language models
  20. C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
  21. Resprompt: Residual Connection Prompting Advances Multi-Step Reasoning in Large Language Models
  22. Design of Chain-of-Thought in Math Problem Solving
  23. Tab-CoT: Zero-shot Tabular Chain of Thought
  24. Donald Ervin Knuth. Sorting and searching. The art of computer programming, 3
  25. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213
  26. Better Zero-Shot Reasoning with Role-Play Prompting
  27. Are Human-generated Demonstrations Necessary for In-context Learning?
  28. Benchmarking and Improving Generator-Validator Consistency of Language Models
  29. RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge
  30. Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
  31. Deductive Verification of Chain-of-Thought Reasoning
  32. Thomas E Mallouk. Divide and conquer. Nature chemistry, 5(5):362–363
  33. EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning
  34. SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
  35. Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
  36. GPT-4 Technical Report
  37. The peter principle, volume 4. Souvenir Press London
  38. Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions
  39. Leveraging Large Language Models for Multiple Choice Question Answering
  40. Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
  41. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning, pp.  31210–31227. PMLR
  42. Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data
  43. Douglas R Smith. The design of divide and conquer algorithms. Science of Computer Programming, 5:37–58
  44. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
  45. Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models
  46. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
  47. LLaMA: Open and Efficient Foundation Language Models
  48. Llama 2: Open Foundation and Fine-Tuned Chat Models
  49. Self-Consistency Improves Chain of Thought Reasoning in Language Models
  50. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837
  51. Large Language Models are Better Reasoners with Self-Verification
  52. RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought
  53. LPML: LLM-Prompting Markup Language for Mathematical Reasoning
  54. Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning
  55. Tree of Thoughts: Deliberate Problem Solving with Large Language Models
  56. Large Language Models as Analogical Reasoners
  57. ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
  58. Self-Convinced Prompting: Few-Shot Question Answering with Repeated Introspection
  59. Automatic Chain of Thought Prompting in Large Language Models
  60. Progressive-Hint Prompting Improves Reasoning in Large Language Models
  61. Large Language Models Are Not Robust Multiple Choice Selectors
  62. Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
  63. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
  64. AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
  65. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
  66. Large Language Models can Learn Rules
  67. Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models

Show All 67

Test Your Knowledge

You answered out of questions correctly.

Well done!