Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DCR: Divide-and-Conquer Reasoning for Multi-choice Question Answering with LLMs (2401.05190v2)

Published 10 Jan 2024 in cs.CL

Abstract: LLMs have shown impressive performance in reasoning benchmarks with the emergence of Chain-of-Thought (CoT), particularly in multi-choice question (MCQ). However, current works equally resolve questions regardless of the problem-solving difficulty, leading to an excessive focus on simple items while insufficient attention on intricate ones. To address this challenge, we propose a simple yet effective strategy, Divide and Conquer Reasoning (DCR), to enhance the reasoning capability of LLMs for MCQs, as inspired by human beings using heuristics to first categorize tasks and then handle them separately. In particular, we first categorize questions into two subsets based on confidence score ($\mathcal{CS}$), which is estimated by statistical frequency of generated answers. Subsequently, we propose Filter Choices based Reasoning (FCR) to improve model performance on MCQs with low ($\mathcal{CS}$). Our experiments demonstrate that the proposed strategy only costs 85% of SOTA, while still achieves average accuracy improvement of 1.56% across nine datasets including arithmetic, commonsense, and logic reasoning tasks. The code is at \url{https://github.com/AiMijie/Divide-and-Conquer}

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
  2. Jon Louis Bentley. Multidimensional divide-and-conquer. Communications of the ACM, 23(4):214–229, 1980.
  3. Divide-and-conquer in multidimensional space. In Proceedings of the eighth annual ACM symposium on Theory of computing, pp.  220–230, 1976.
  4. Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687, 2023.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  6. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588, 2022.
  7. Chatcot: Tool-augmented chain-of-thought reasoning on chat-based large language models. arXiv preprint arXiv:2305.14323, 2023.
  8. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  9. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457, 2018.
  10. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
  11. Agent instructs large language models to be general zero-shot reasoners. arXiv preprint arXiv:2310.03710, 2023.
  12. Fill in the blank: Exploring and enhancing llm capabilities for backward reasoning in math word problems. arXiv preprint arXiv:2310.01991, 2023.
  13. Active prompting with chain-of-thought for large language models. arXiv preprint arXiv:2302.12246, 2023.
  14. Michael Eisenstein. Divide and conquer. Nature, 441(7097):1179–1179, 2006.
  15. Complexity-based prompting for multi-step reasoning. arXiv preprint arXiv:2210.00720, 2022.
  16. Pal: Program-aided language models. In International Conference on Machine Learning, pp.  10764–10799. PMLR, 2023.
  17. Gauss and the history of the fast fourier transform. IEEE Assp Magazine, 1(4):14–21, 1984.
  18. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
  19. Code prompting: a neural symbolic method for complex reasoning in large language models, 2023.
  20. C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models. arXiv preprint arXiv:2305.08322, 2023.
  21. Resprompt: Residual connection prompting advances multi-step reasoning in large language models. arXiv preprint arXiv:2310.04743, 2023.
  22. Design of chain-of-thought in math problem solving. arXiv preprint arXiv:2309.11054, 2023.
  23. Tab-cot: Zero-shot tabular chain of thought. arXiv preprint arXiv:2305.17812, 2023.
  24. Donald Ervin Knuth. Sorting and searching. The art of computer programming, 3, 1998.
  25. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
  26. Better zero-shot reasoning with role-play prompting. arXiv preprint arXiv:2308.07702, 2023.
  27. Are human-generated demonstrations necessary for in-context learning? arXiv preprint arXiv:2309.14681, 2023a.
  28. Benchmarking and improving generator-validator consistency of language models. arXiv preprint arXiv:2310.01846, 2023b.
  29. Riddlesense: Reasoning about riddle questions featuring linguistic creativity and commonsense knowledge. arXiv preprint arXiv:2101.00376, 2021.
  30. Program induction by rationale generation: Learning to solve and explain algebraic word problems. arXiv preprint arXiv:1705.04146, 2017.
  31. Deductive verification of chain-of-thought reasoning. arXiv preprint arXiv:2306.03872, 2023.
  32. Thomas E Mallouk. Divide and conquer. Nature chemistry, 5(5):362–363, 2013.
  33. Echoprompt: Instructing the model to rephrase queries for improved in-context learning. arXiv preprint arXiv:2309.10687, 2023.
  34. Selfcheck: Using llms to zero-shot check their own step-by-step reasoning. arXiv preprint arXiv:2308.00436, 2023.
  35. Can a suit of armor conduct electricity? a new dataset for open book question answering. arXiv preprint arXiv:1809.02789, 2018.
  36. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
  37. The peter principle, volume 4. Souvenir Press London, 1969.
  38. Large language models sensitivity to the order of options in multiple-choice questions. arXiv preprint arXiv:2308.11483, 2023.
  39. Leveraging large language models for multiple choice question answering. arXiv preprint arXiv:2210.12353, 2022.
  40. Algorithm of thoughts: Enhancing exploration of ideas in large language models. arXiv preprint arXiv:2308.10379, 2023.
  41. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning, pp.  31210–31227. PMLR, 2023.
  42. Automatic prompt augmentation and selection with chain-of-thought from labeled data. arXiv preprint arXiv:2302.12822, 2023.
  43. Douglas R Smith. The design of divide and conquer algorithms. Science of Computer Programming, 5:37–58, 1985.
  44. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
  45. Enhancing chain-of-thoughts prompting with iterative bootstrapping in large language models. arXiv preprint arXiv:2304.11657, 2023.
  46. Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937, 2018.
  47. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  48. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  49. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
  50. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  51. Large language models are better reasoners with self-verification. CoRR, abs/2212.09561, 2023.
  52. Rcot: Detecting and rectifying factual inconsistency in reasoning by reversing chain-of-thought. arXiv preprint arXiv:2305.11499, 2023.
  53. Lpml: Llm-prompting markup language for mathematical reasoning. arXiv preprint arXiv:2309.13078, 2023.
  54. Concise and organized perception facilitates large language models for deductive reasoning. arXiv preprint arXiv:2310.03309, 2023.
  55. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
  56. Large language models as analogical reasoners. arXiv preprint arXiv:2310.01714, 2023.
  57. Reclor: A reading comprehension dataset requiring logical reasoning. arXiv preprint arXiv:2002.04326, 2020.
  58. Self-convinced prompting: Few-shot question answering with repeated introspection. arXiv preprint arXiv:2310.05035, 2023.
  59. Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493, 2022.
  60. Progressive-hint prompting improves reasoning in large language models. arXiv preprint arXiv:2304.09797, 2023a.
  61. On large language models’ selection bias in multi-choice questions. arXiv preprint arXiv:2309.03882, 2023b.
  62. Take a step back: Evoking reasoning via abstraction in large language models. arXiv preprint arXiv:2310.06117, 2023c.
  63. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685, 2023d.
  64. Agieval: A human-centric benchmark for evaluating foundation models. arXiv preprint arXiv:2304.06364, 2023.
  65. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022.
  66. Large language models can learn rules. arXiv preprint arXiv:2310.07064, 2023.
  67. Meta-cot: Generalizable chain-of-thought prompting in mixed-task scenarios with large language models. arXiv preprint arXiv:2310.06692, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com