Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics (2410.21272v1)

Published 28 Oct 2024 in cs.CL

Abstract: Do LLMs solve reasoning tasks by learning robust generalizable algorithms, or do they memorize training data? To investigate this question, we use arithmetic reasoning as a representative task. Using causal analysis, we identify a subset of the model (a circuit) that explains most of the model's behavior for basic arithmetic logic and examine its functionality. By zooming in on the level of individual circuit neurons, we discover a sparse set of important neurons that implement simple heuristics. Each heuristic identifies a numerical input pattern and outputs corresponding answers. We hypothesize that the combination of these heuristic neurons is the mechanism used to produce correct arithmetic answers. To test this, we categorize each neuron into several heuristic types-such as neurons that activate when an operand falls within a certain range-and find that the unordered combination of these heuristic types is the mechanism that explains most of the model's accuracy on arithmetic prompts. Finally, we demonstrate that this mechanism appears as the main source of arithmetic accuracy early in training. Overall, our experimental results across several LLMs show that LLMs perform arithmetic using neither robust algorithms nor memorization; rather, they rely on a "bag of heuristics".

Arithmetic Without Algorithms: LLMs Solve Math with a Bag of Heuristics

This paper investigates whether LLMs solve arithmetic reasoning tasks using robust algorithms or if their behavior is primarily driven by memorization. By employing causal analysis, the authors identify and analyze circuits responsible for arithmetic calculations in various LLMs, discovering that arithmetic operations are primarily explained by a structured assembly of heuristic neurons, rather than coherent algorithms or mere memorization.

The research introduces a novel perspective by examining arithmetic reasoning at the level of individual neurons within identified circuits. Each neuron is found to implement simple heuristics, such as activating based on numerical patterns or operand ranges. These heuristics are aggregated into a "bag of heuristics" forming the operational structure used by LLMs to solve arithmetic tasks.

Key Findings and Methodology

  1. Circuit Localization: The authors apply activation patching experiments across several LLMs, including Llama3-8B, Llama3-70B, Pythia-6.9B, and GPT-J, to identify model components (MLPs and attention heads) critical for arithmetic computations. These components collectively form what the authors define as arithmetic circuits.
  2. Sparse Neuronal Contribution: Surprisingly, a sparse subset of neurons within these circuits suffices to predict arithmetic outcomes accurately. These neurons are distinct across different operations, indicating operator-specific heuristic implementations.
  3. Neuron-Level Heuristics: By using the Logit Lens to project each neuron's value vectors onto numerical tokens, the paper reveals two prominent heuristic types: direct heuristics that directly enhance result tokens, and indirect heuristics that influence intermediate features. Neurons often activate when either operand or result meets specific numerical conditions, like congruence or range conformance.
  4. Bag of Heuristics: The collective effect of multiple independent heuristic neurons is the core mechanism propelling LLMs' arithmetic logic. Experiments demonstrate that ablating neurons associated with a particular heuristic leads to a marked decrease in model accuracy. This correlation supports the assertion that arithmetic completion predominantly relies on combined heuristic effects rather than singular neuron responses.
  5. Training Dynamics: The heuristics emerge early in training, suggesting no significant overhaul or replacement of initial arithmetic strategies with an advanced mechanism. This gradual development points to a learning trajectory that might contribute to over-specializing in heuristic-based solutions.

Implications and Future Directions

The findings challenge the notion of LLMs internalizing robust algorithmic processes, emphasizing a reliance on numerous heuristics. This insight complicates the landscape of interpretability, given that these heuristic combinations, albeit effective, may not generalize well to unseen or out-of-distribution data.

Practical implications include the potential need for new architectures or training paradigms that promote more generalized problem-solving methods over memorized heuristics. Theoretical implications extend to redefining how model interpretability frameworks account for multi-level abstraction interactions observed within LLMs.

Future research could explore regularization techniques to foster robust generalization or dissect whether similar heuristic formations are found in other reasoning tasks beyond arithmetic. Investigating alternative architectures that might naturally avoid heuristic over-reliance presents a compelling avenue for advancing LLM capabilities.

This paper offers a meticulous dissection of arithmetic reasoning in LLMs, paving the way for enhanced understanding and development of future AI models that may be capable of transcending heuristic dependencies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Generalization vs. memorization: Tracing language models’ capabilities back to pretraining data. In ICML 2024 Workshop on Foundation Models in the Wild, 2024. URL https://openreview.net/forum?id=0LaybrPql4.
  2. Measures of information reflect memorization patterns. In Advances in Neural Information Processing Systems, 2022.
  3. Yonatan Belinkov. Probing classifiers: Promises, shortcomings, and advances. Computational Linguistics, 48(1):207–219, 2022.
  4. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pp.  2397–2430. PMLR, 2023.
  5. Quantifying memorization across neural language models. In The Eleventh International Conference on Learning Representations, 2023.
  6. Generalisation first, memorisation second? Memorisation localisation for natural language classification tasks. In The 62nd Annual Meeting of the Association for Computational Linguistics, pp.  14348–14366. Association for Computational Linguistics, 2024.
  7. Analyzing transformers in embedding space. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  16124–16170, 2023.
  8. Survival of the fittest representation: A case study with modular addition. In ICML 2024 Workshop on Mechanistic Interpretability, 2024. URL https://openreview.net/forum?id=2WfiYQlZDa.
  9. The Llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
  10. A mathematical framework for transformer circuits. Transformer Circuits Thread, 2021. https://transformer-circuits.pub/2021/framework/index.html.
  11. Causal abstractions of neural networks. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  9574–9586. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/4f5c422f4d49a5a807eda27434231040-Paper.pdf.
  12. Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  5484–5495, 2021.
  13. Successor heads: Recurring, interpretable attention heads in the wild. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=kvcbV8KQsi.
  14. How does gpt-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model. Advances in Neural Information Processing Systems, 36, 2024a.
  15. Have faith in faithfulness: Going beyond circuit overlap when finding model mechanisms. arXiv preprint arXiv:2403.17806, 2024b.
  16. Superposition, memorization, and double descent. Transformer Circuits Thread, 6:24, 2023.
  17. Towards a mechanistic interpretation of multi-step reasoning capabilities of language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  4902–4919, 2023.
  18. Othellogpt learned a bag of heuristics, 2024. URL https://www.lesswrong.com/posts/gcpNuEZnxAPayaKBY/othellogpt-learned-a-bag-of-heuristics-1.
  19. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), San Diega, CA, USA, 2015.
  20. Emergent world representations: Exploring a sequence model trained on a synthetic task. In The Eleventh International Conference on Learning Representations, 2022.
  21. Pay attention to MLPs. Advances in neural information processing systems, 34:9204–9215, 2021.
  22. Arithmetic with language models: From memorization to computation. arXiv preprint arXiv:2308.01154, 2023.
  23. Copy suppression: Comprehensively understanding an attention head. arXiv preprint arXiv:2310.04625, 2023.
  24. Neel Nanda. Attribution patching: Activation patching at industrial scale, 2022. URL https://www.neelnanda.io/mechanistic-interpretability/attribution-patching.
  25. TransformerLens. https://github.com/TransformerLensOrg/TransformerLens, 2022.
  26. Progress measures for grokking via mechanistic interpretability. In The Eleventh International Conference on Learning Representations, 2023.
  27. nostalgebraist. Interpreting GPT: The logit lens, 2020. URL https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens.
  28. Judea Pearl. Direct and indirect effects. Direct and Indirect Effects, 2001. URL https://ftp.cs.ucla.edu/pub/stat_ser/R273-U.pdf.
  29. Fine-tuning enhances existing mechanisms: A case study on entity tracking. In The Twelfth International Conference on Learning Representations, 2024.
  30. A mechanistic interpretation of arithmetic reasoning in language models using causal mediation analysis. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  7035–7052, 2023.
  31. Extracting latent steering vectors from pretrained language models. In Findings of the Association for Computational Linguistics: ACL 2022, pp.  566–581, 2022.
  32. Attribution patching outperforms automated circuit discovery. In NeurIPS Workshop on Attributing Model Behavior at Scale, 2023. URL https://openreview.net/forum?id=tiLbFR4bJW.
  33. Memorisation versus generalisation in pre-trained language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  7564–7578, 2022.
  34. Activation addition: Steering language models without optimization. arXiv preprint arXiv:2308.10248, 2023.
  35. Explaining grokking through circuit efficiency. arXiv e-prints, pp.  arXiv–2309, 2023.
  36. Investigating gender bias in language models using causal mediation analysis. Advances in neural information processing systems, 33:12388–12401, 2020.
  37. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, May 2021.
  38. Interpretability in the wild: A circuit for indirect object identification in GPT-2 small. In The Eleventh International Conference on Learning Representations, 2022.
  39. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
  40. Interpreting and improving large language models in arithmetic calculation. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=CfOtiepP8s.
  41. The clock and the pizza: Two stories in mechanistic explanation of neural networks. Advances in Neural Information Processing Systems, 36, 2024.
  42. Pre-trained large language models use Fourier features to compute addition. arXiv preprint arXiv:2406.03445, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yaniv Nikankin (5 papers)
  2. Anja Reusch (6 papers)
  3. Aaron Mueller (35 papers)
  4. Yonatan Belinkov (111 papers)
Citations (1)
Youtube Logo Streamline Icon: https://streamlinehq.com