Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
Abstract: Do LLMs solve reasoning tasks by learning robust generalizable algorithms, or do they memorize training data? To investigate this question, we use arithmetic reasoning as a representative task. Using causal analysis, we identify a subset of the model (a circuit) that explains most of the model's behavior for basic arithmetic logic and examine its functionality. By zooming in on the level of individual circuit neurons, we discover a sparse set of important neurons that implement simple heuristics. Each heuristic identifies a numerical input pattern and outputs corresponding answers. We hypothesize that the combination of these heuristic neurons is the mechanism used to produce correct arithmetic answers. To test this, we categorize each neuron into several heuristic types-such as neurons that activate when an operand falls within a certain range-and find that the unordered combination of these heuristic types is the mechanism that explains most of the model's accuracy on arithmetic prompts. Finally, we demonstrate that this mechanism appears as the main source of arithmetic accuracy early in training. Overall, our experimental results across several LLMs show that LLMs perform arithmetic using neither robust algorithms nor memorization; rather, they rely on a "bag of heuristics".
- Generalization vs. memorization: Tracing language models’ capabilities back to pretraining data. In ICML 2024 Workshop on Foundation Models in the Wild, 2024. URL https://openreview.net/forum?id=0LaybrPql4.
- Measures of information reflect memorization patterns. In Advances in Neural Information Processing Systems, 2022.
- Yonatan Belinkov. Probing classifiers: Promises, shortcomings, and advances. Computational Linguistics, 48(1):207–219, 2022.
- Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pp. 2397–2430. PMLR, 2023.
- Quantifying memorization across neural language models. In The Eleventh International Conference on Learning Representations, 2023.
- Generalisation first, memorisation second? Memorisation localisation for natural language classification tasks. In The 62nd Annual Meeting of the Association for Computational Linguistics, pp. 14348–14366. Association for Computational Linguistics, 2024.
- Analyzing transformers in embedding space. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 16124–16170, 2023.
- Survival of the fittest representation: A case study with modular addition. In ICML 2024 Workshop on Mechanistic Interpretability, 2024. URL https://openreview.net/forum?id=2WfiYQlZDa.
- The Llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
- A mathematical framework for transformer circuits. Transformer Circuits Thread, 2021. https://transformer-circuits.pub/2021/framework/index.html.
- Causal abstractions of neural networks. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 9574–9586. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/4f5c422f4d49a5a807eda27434231040-Paper.pdf.
- Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 5484–5495, 2021.
- Successor heads: Recurring, interpretable attention heads in the wild. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=kvcbV8KQsi.
- How does gpt-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model. Advances in Neural Information Processing Systems, 36, 2024a.
- Have faith in faithfulness: Going beyond circuit overlap when finding model mechanisms. arXiv preprint arXiv:2403.17806, 2024b.
- Superposition, memorization, and double descent. Transformer Circuits Thread, 6:24, 2023.
- Towards a mechanistic interpretation of multi-step reasoning capabilities of language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 4902–4919, 2023.
- Othellogpt learned a bag of heuristics, 2024. URL https://www.lesswrong.com/posts/gcpNuEZnxAPayaKBY/othellogpt-learned-a-bag-of-heuristics-1.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), San Diega, CA, USA, 2015.
- Emergent world representations: Exploring a sequence model trained on a synthetic task. In The Eleventh International Conference on Learning Representations, 2022.
- Pay attention to MLPs. Advances in neural information processing systems, 34:9204–9215, 2021.
- Arithmetic with language models: From memorization to computation. arXiv preprint arXiv:2308.01154, 2023.
- Copy suppression: Comprehensively understanding an attention head. arXiv preprint arXiv:2310.04625, 2023.
- Neel Nanda. Attribution patching: Activation patching at industrial scale, 2022. URL https://www.neelnanda.io/mechanistic-interpretability/attribution-patching.
- TransformerLens. https://github.com/TransformerLensOrg/TransformerLens, 2022.
- Progress measures for grokking via mechanistic interpretability. In The Eleventh International Conference on Learning Representations, 2023.
- nostalgebraist. Interpreting GPT: The logit lens, 2020. URL https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens.
- Judea Pearl. Direct and indirect effects. Direct and Indirect Effects, 2001. URL https://ftp.cs.ucla.edu/pub/stat_ser/R273-U.pdf.
- Fine-tuning enhances existing mechanisms: A case study on entity tracking. In The Twelfth International Conference on Learning Representations, 2024.
- A mechanistic interpretation of arithmetic reasoning in language models using causal mediation analysis. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 7035–7052, 2023.
- Extracting latent steering vectors from pretrained language models. In Findings of the Association for Computational Linguistics: ACL 2022, pp. 566–581, 2022.
- Attribution patching outperforms automated circuit discovery. In NeurIPS Workshop on Attributing Model Behavior at Scale, 2023. URL https://openreview.net/forum?id=tiLbFR4bJW.
- Memorisation versus generalisation in pre-trained language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 7564–7578, 2022.
- Activation addition: Steering language models without optimization. arXiv preprint arXiv:2308.10248, 2023.
- Explaining grokking through circuit efficiency. arXiv e-prints, pp. arXiv–2309, 2023.
- Investigating gender bias in language models using causal mediation analysis. Advances in neural information processing systems, 33:12388–12401, 2020.
- GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, May 2021.
- Interpretability in the wild: A circuit for indirect object identification in GPT-2 small. In The Eleventh International Conference on Learning Representations, 2022.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
- Interpreting and improving large language models in arithmetic calculation. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=CfOtiepP8s.
- The clock and the pizza: Two stories in mechanistic explanation of neural networks. Advances in Neural Information Processing Systems, 36, 2024.
- Pre-trained large language models use Fourier features to compute addition. arXiv preprint arXiv:2406.03445, 2024.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.