Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Models can Learn Rules (2310.07064v3)

Published 10 Oct 2023 in cs.AI and cs.CL

Abstract: When prompted with a few examples and intermediate steps, LLMs have demonstrated impressive performance in various reasoning tasks. However, prompting methods that rely on implicit knowledge in an LLM often generate incorrect answers when the implicit knowledge is wrong or inconsistent with the task. To tackle this problem, we present Hypotheses-to-Theories (HtT), a framework that learns a rule library for reasoning with LLMs. HtT contains two stages, an induction stage and a deduction stage. In the induction stage, an LLM is first asked to generate and verify rules over a set of training examples. Rules that appear and lead to correct answers sufficiently often are collected to form a rule library. In the deduction stage, the LLM is then prompted to employ the learned rule library to perform reasoning to answer test questions. Experiments on relational reasoning, numerical reasoning and concept learning problems show that HtT improves existing prompting methods, with an absolute gain of 10-30% in accuracy. The learned rules are also transferable to different models and to different forms of the same problem.

LLMs Can Learn Rules: An Analytical Perspective

The paper "LLMs can Learn Rules" presents an innovative approach to enhancing the reasoning capabilities of LLMs by employing a novel framework called Hypotheses-to-Theories (HtT). The core motivation behind HtT is to address the prevalent issue of hallucinations in LLMs, which occur when the model generates outputs that seem plausible but are incorrect. This often arises from the mismatch between the implicit knowledge embedded in the model during pretraining and the explicit knowledge required for specific reasoning tasks.

Framework Overview

HtT is structured into two main stages: an induction stage and a deduction stage. During the induction stage, the model is prompted to generate rules from a set of training examples and subsequently verify these rules. The rules that frequently lead to correct outcomes are collected into a rule library. The deduction stage involves using this learned rule library to guide the LLM in solving reasoning problems.

Empirical Results

The empirical evaluation demonstrates substantial improvements in accuracy, with HtT providing an absolute gain of 11-27% in various reasoning tasks compared to baseline prompting methods. These include numerical reasoning challenges exemplified by arithmetic in non-decimal systems, and relational reasoning tasks as demonstrated by the CLUTRR dataset. Notably, the learned rules showed transferability across models and different problem formulations.

Technical Contributions

  1. Rule Generation and Verification: The induction stage uniquely uses the capability of LLMs to hypothesize and empirically verify rules, thereby reducing reliance solely on the implicit knowledge of the models.
  2. Induction from Deduction: This strategy simplifies prompt engineering by merging rule generation and verification under a single deductive reasoning prompt, leveraging existing techniques like chain-of-thought.
  3. XML Tagging for Rule Retrieval: To enhance the model’s in-context retrieval abilities, the paper introduces an XML tagging mechanism that organizes rules hierarchically. This facilitates effective retrieval even with a large rule set.

Implications and Future Directions

The development of HtT has significant implications for the field of AI, particularly in improving the robustness and accuracy of LLMs in reasoning tasks. By providing a method to learn and apply explicit knowledge, LLMs become less dependent on probabilistic guessing and more capable of structured reasoning. This could enhance their applicability in areas requiring high accuracy, such as legal reasoning, financial forecasting, and scientific discovery.

Future research could explore the scalability of this approach, especially for models with longer context lengths and larger rule libraries. Additionally, integrating fine-tuning for retrieval capabilities could further enhance performance. There remains an open challenge to refine the learning of complex rules that span multiple reasoning steps, potentially through more advanced machine learning techniques or integrating external symbolic reasoning systems.

Conclusion

The paper offers a promising direction for enhancing the reasoning abilities of LLMs by reducing hallucinations through explicit rule learning and application. The Hypotheses-to-Theories framework presents a structured method that leverages both the strengths of LLMs and the necessity for explicit reasoning, marking a notable advancement in computational reasoning methodologies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, volume 1215, pp.  487–499. Santiago, Chile, 1994.
  2. Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
  3. Francis Bacon. Novum Organum. 1620.
  4. Systematic generalization with edge transformers. Advances in Neural Information Processing Systems, 34:1390–1402, 2021.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  6. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  7. Making neural programming architectures generalize via recursion. In International Conference on Learning Representations, 2017.
  8. Large language models as tool makers. arXiv preprint arXiv:2305.17126, 2023.
  9. Towards synthesizing complex programs from input-output examples. In International Conference on Learning Representations, 2018.
  10. Compositional generalization via neural-symbolic stack machines. In Advances in Neural Information Processing Systems, 2020.
  11. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  12. Transformers as soft reasoners over language. arXiv preprint arXiv:2002.05867, 2020.
  13. Selection-inference: Exploiting large language models for interpretable logical reasoning. 2023.
  14. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  15. Dreamcoder: Bootstrapping inductive program synthesis with wake-sleep library learning. In Proceedings of the 42nd acm sigplan international conference on programming language design and implementation, pp.  835–850, 2021.
  16. Amie: association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the 22nd international conference on World Wide Web, pp.  413–422, 2013.
  17. Inductive logical query answering in knowledge graphs. Advances in Neural Information Processing Systems, 35:15230–15243, 2022.
  18. Visual programming: Compositional visual reasoning without training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14953–14962, 2023.
  19. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.
  20. Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, pp.  15696–15707. PMLR, 2023.
  21. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  22. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906, 2020.
  23. LAMBADA: Backward chaining for automated reasoning in natural language. Association for Computational Linguistics, 2023.
  24. Decomposed prompting: A modular approach for solving complex tasks. In International Conference on Learning Representations, 2023.
  25. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
  26. Ask me anything: Dynamic memory networks for natural language processing. In International conference on machine learning, pp.  1378–1387. PMLR, 2016.
  27. Let’s verify step by step. arXiv preprint arXiv:2305.20050, 2023.
  28. Entity-based knowledge conflicts in question answering. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  7052–7063, 2021.
  29. Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842, 2023.
  30. R5: Rule discovery with reinforced and recurrent relational reasoning. International Conference on Learning Representations, 2022.
  31. Sources of hallucination by large language models on inference tasks. arXiv preprint arXiv:2305.14552, 2023.
  32. Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837, 2022.
  33. Learning reasoning strategies in end-to-end differentiable proving. In International Conference on Machine Learning, pp.  6938–6949. PMLR, 2020.
  34. Inductive logic programming: Theory and methods. The Journal of Logic Programming, 19:629–679, 1994.
  35. Learning compositional rules via neural program synthesis. Advances in Neural Information Processing Systems, 33:10832–10842, 2020.
  36. Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021.
  37. OpenAI. Gpt-4 technical report. 2023.
  38. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022.
  39. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  2463–2473, 2019.
  40. Rnnlogic: Learning logic rules for reasoning on knowledge graphs. International Conference on Learning Representations, 2021.
  41. Improving language understanding by generative pre-training. 2018.
  42. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  43. Neural programmer-interpreters. In International Conference on Learning Representations, 2016.
  44. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  5418–5426, 2020.
  45. End-to-end differentiable proving. Advances in neural information processing systems, 30, 2017.
  46. Multitask prompted training enables zero-shot task generalization. International Conference on Learning Representations, 2022.
  47. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  48. Clutrr: A diagnostic benchmark for inductive reasoning from text. arXiv preprint arXiv:1908.06177, 2019.
  49. Artificial intelligence-a modern approach 3rd ed, 2016.
  50. Leap-of-thought: Teaching pre-trained models to systematically reason over implicit knowledge. Advances in Neural Information Processing Systems, 33:20227–20237, 2020.
  51. Large language models are in-context semantic reasoners rather than symbolic reasoners. arXiv preprint arXiv:2305.14825, 2023.
  52. Inductive relation prediction by subgraph reasoning. In International Conference on Machine Learning, pp.  9448–9457. PMLR, 2020.
  53. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  54. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023a.
  55. Kepler: A unified model for knowledge embedding and pre-trained language representation. Transactions of the Association for Computational Linguistics, 9:176–194, 2021.
  56. Unleashing cognitive synergy in large language models: A task-solving agent through multi-persona self-collaboration. arXiv preprint arXiv:2307.05300, 2023b.
  57. Finetuned language models are zero-shot learners. International Conference on Learning Representations, 2022a.
  58. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022b.
  59. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022c.
  60. Memory networks. International Conference on Learning Representations, 2015.
  61. Towards ai-complete question answering: A set of prerequisite toy tasks. International Conference on Learning Representations, 2016.
  62. Reasoning or reciting? exploring the capabilities and limitations of language models through counterfactual tasks. arXiv preprint arXiv:2307.02477, 2023.
  63. Differentiable learning of logical rules for knowledge base reasoning. Advances in neural information processing systems, 30, 2017.
  64. Learning symbolic rules for reasoning in quasi-natural language. arXiv preprint arXiv:2111.12038, 2021.
  65. Language models as inductive reasoners. arXiv preprint arXiv:2212.10923, 2022.
  66. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
  67. Improved logical reasoning of language models via differentiable symbolic programming. arXiv preprint arXiv:2305.03742, 2023a.
  68. How language model hallucinations can snowball. arXiv preprint arXiv:2305.13534, 2023b.
  69. Why does chatgpt fall short in answering questions faithfully? arXiv preprint arXiv:2304.10513, 2023.
  70. Least-to-most prompting enables complex reasoning in large language models. In International Conference on Learning Representations, 2023.
  71. Teaching algorithmic reasoning via in-context learning. arXiv preprint arXiv:2211.09066, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zhaocheng Zhu (22 papers)
  2. Yuan Xue (59 papers)
  3. Xinyun Chen (80 papers)
  4. Denny Zhou (65 papers)
  5. Jian Tang (327 papers)
  6. Dale Schuurmans (112 papers)
  7. Hanjun Dai (63 papers)
Citations (50)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com