Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Case-Based or Rule-Based: How Do Transformers Do the Math? (2402.17709v2)

Published 27 Feb 2024 in cs.AI and cs.CL

Abstract: Despite the impressive performance in a variety of complex tasks, modern LLMs still have trouble dealing with some math problems that are simple and intuitive for humans, such as addition. While we can easily learn basic rules of addition and apply them to new problems of any length, LLMs struggle to do the same. Instead, they may rely on similar cases seen in the training corpus for help. We define these two different reasoning mechanisms as "rule-based reasoning" and "case-based reasoning". Since rule-based reasoning is essential for acquiring systematic generalization ability, we aim to explore exactly whether transformers use rule-based or case-based reasoning for math problems. Through carefully designed intervention experiments on five math tasks, we confirm that transformers are performing case-based reasoning, no matter whether scratchpad is used, which aligns with the previous observations that transformers use subgraph matching/shortcut learning to reason. To mitigate such problems, we propose a Rule-Following Fine-Tuning (RFFT) technique to teach transformers to perform rule-based reasoning. Specifically, we provide explicit rules in the input and then instruct transformers to recite and follow the rules step by step. Through RFFT, we successfully enable LLMs fine-tuned on 1-5 digit addition to generalize to up to 12-digit addition with over 95% accuracy, which is over 40% higher than scratchpad. The significant improvement demonstrates that teaching LLMs to use rules explicitly helps them learn rule-based reasoning and generalize better in length.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Generalization on the unseen, logic reasoning and degree curriculum, 2023.
  2. What learning algorithm is in-context learning? investigations with linear models, 2023.
  3. Exploring length generalization in large language models, 2022.
  4. Chatgpt is a knowledgeable but inexperienced solver: An investigation of commonsense problem in large language models, 2023.
  5. Language models are few-shot learners, 2020.
  6. Sparks of artificial general intelligence: Early experiments with gpt-4, 2023.
  7. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
  8. Why can gpt learn in-context? language models implicitly perform gradient descent as meta-optimizers, 2023.
  9. Faith and fate: Limits of transformers on compositionality, 2023.
  10. Towards revealing the mystery behind chain of thought: A theoretical perspective, 2023.
  11. What can transformers learn in-context? a case study of simple function classes, 2023.
  12. Towards a mechanistic interpretation of multi-step reasoning capabilities of language models, 2023.
  13. The impact of positional encoding on length generalization in transformers, 2023.
  14. Decomposed prompting: A modular approach for solving complex tasks, 2023.
  15. Large language models are zero-shot reasoners, 2023.
  16. Humans in humans out: On gpt converging toward common sense in both success and failure, 2023.
  17. Teaching arithmetic to small transformers, 2023.
  18. Loogle: Can long-context language models understand long contexts? arXiv preprint arXiv:2311.04939, 2023.
  19. Towards understanding grokking: An effective theory of representation learning, 2022.
  20. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. arXiv preprint arXiv:2305.14251, 2023.
  21. Progress measures for grokking via mechanistic interpretability, 2023.
  22. Show your work: Scratchpads for intermediate computation with language models, 2021.
  23. OpenAI. Introducing chatgpt, 2022. https://openai.com/blog/chatgpt.
  24. OpenAI. Gpt-4 technical report, 2023.
  25. Grokking: Generalization beyond overfitting on small algorithmic datasets, 2022.
  26. Limitations of language models in arithmetic and symbolic induction, 2022.
  27. Is chatgpt a general-purpose natural language processing task solver?, 2023.
  28. Language models are unsupervised multitask learners. 2019. URL https://api.semanticscholar.org/CorpusID:160025533.
  29. Positional description matters for transformers arithmetic, 2023.
  30. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
  31. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022.
  32. Llama: Open and efficient foundation language models, 2023a.
  33. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  34. Transformers learn in-context by gradient descent, 2023.
  35. Emergent abilities of large language models, 2022.
  36. Chain-of-thought prompting elicits reasoning in large language models, 2023.
  37. Reasoning or reciting? exploring the capabilities and limitations of language models through counterfactual tasks, 2023.
  38. It ain’t that bad: Understanding the mysterious performance drop in ood generalization for generative transformer models, 2023.
  39. Explaining the complex task reasoning of large language models with template-content structure, 2023.
  40. Are transformers universal approximators of sequence-to-sequence functions?, 2020.
  41. Counterfactual memorization in neural language models, 2023.
  42. The clock and the pizza: Two stories in mechanistic explanation of neural networks, 2023.
  43. Least-to-most prompting enables complex reasoning in large language models, 2023a.
  44. Teaching algorithmic reasoning via in-context learning, 2022.
  45. What algorithms can transformers learn? a study in length generalization, 2023b.
  46. Transformers can achieve length generalization but not robustly, 2024.
  47. Large language models can learn rules, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yi Hu (129 papers)
  2. Xiaojuan Tang (5 papers)
  3. Haotong Yang (11 papers)
  4. Muhan Zhang (89 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com