Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis (2305.15054v2)

Published 24 May 2023 in cs.CL and cs.LG

Abstract: Mathematical reasoning in LLMs (LMs) has garnered significant attention in recent work, but there is a limited understanding of how these models process and store information related to arithmetic tasks within their architecture. In order to improve our understanding of this aspect of LLMs, we present a mechanistic interpretation of Transformer-based LMs on arithmetic questions using a causal mediation analysis framework. By intervening on the activations of specific model components and measuring the resulting changes in predicted probabilities, we identify the subset of parameters responsible for specific predictions. This provides insights into how information related to arithmetic is processed by LMs. Our experimental results indicate that LMs process the input by transmitting the information relevant to the query from mid-sequence early layers to the final token using the attention mechanism. Then, this information is processed by a set of MLP modules, which generate result-related information that is incorporated into the residual stream. To assess the specificity of the observed activation dynamics, we compare the effects of different model components on arithmetic queries with other tasks, including number retrieval from prompts and factual knowledge questions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Omer Antverg and Yonatan Belinkov. 2022. On the pitfalls of analyzing individual neurons in language models. In International Conference on Learning Representations.
  2. Layer normalization. arXiv preprint arXiv:1607.06450.
  3. Eliciting latent predictions from transformers with the tuned lens. arXiv preprint arXiv:2303.08112.
  4. Pythia: A suite for analyzing large language models across training and scaling. arXiv preprint arXiv:2304.01373.
  5. Language models can explain neurons in language models. https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html.
  6. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
  7. Discovering latent knowledge in language models without supervision. In The Eleventh International Conference on Learning Representations.
  8. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  9. Training verifiers to solve math word problems.
  10. Jump to conclusions: Short-Cutting transformers with linear transformations. arXiv preprint arXiv:2303.09435.
  11. A mathematical framework for transformer circuits. Transformer Circuits Thread.
  12. Causal analysis of syntactic agreement mechanisms in neural language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1828–1843, Online. Association for Computational Linguistics.
  13. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027.
  14. Causal abstractions of neural networks. Advances in Neural Information Processing Systems, 34:9574–9586.
  15. Dissecting recall of factual associations in auto-regressive language models. arXiv preprint arXiv:2304.14767.
  16. Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 30–45, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  17. Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  18. Multimodal neurons in artificial neural networks. Distill, 6(3):e30.
  19. Finding neurons in a haystack: Case studies with sparse probing. arXiv preprint arXiv:2305.01610.
  20. MRKL systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning. arXiv preprint arXiv:2205.00445.
  21. Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078.
  22. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, volume 35, pages 22199–22213. Curran Associates, Inc.
  23. Solving quantitative reasoning problems with language models.
  24. Inference-time intervention: Eliciting truthful answers from a language model.
  25. Tiedong Liu and Bryan Kian Hsiang Low. 2023. Goat: Fine-tuned LLaMA outperforms GPT-4 on arithmetic tasks. arXiv preprint arXiv:2305.14201.
  26. Locating and editing factual associations in GPT. Advances in Neural Information Processing Systems, 35:17359–17372.
  27. NumGLUE: A suite of fundamental yet challenging mathematical reasoning tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3505–3523, Dublin, Ireland. Association for Computational Linguistics.
  28. Progress measures for grokking via mechanistic interpretability. arXiv preprint arXiv:2301.05217.
  29. Zoom in: An introduction to circuits. Distill. Https://distill.pub/2020/circuits/zoom-in.
  30. Feature visualization. Distill, 2(11):e7.
  31. The building blocks of interpretability. Distill, 3(3):e10.
  32. In-context learning and induction heads. Transformer Circuits Thread.
  33. OpenAI. 2023. GPT-4 technical report.
  34. Kuntal Kumar Pal and Chitta Baral. 2021. Investigating numeracy learning ability of a text-to-text transfer model. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3095–3101, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  35. Judea Pearl. 2001. Direct and indirect effects. In UAI ’01: Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, University of Washington, Seattle, Washington, USA, August 2-5, 2001, pages 411–420. Morgan Kaufmann.
  36. Judea Pearl. 2009. Causality. Cambridge University Press.
  37. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473, Hong Kong, China. Association for Computational Linguistics.
  38. Measuring and improving BERT’s mathematical abilities by predicting the order of reasoning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 383–394, Online. Association for Computational Linguistics.
  39. Impact of pretraining term frequencies on few-shot numerical reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 840–854, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  40. Toward transparent AI: A survey on interpreting the inner structures of deep neural networks.
  41. Noam Shazeer and Mitchell Stern. 2018. Adafactor: Adaptive learning rates with sublinear memory cost. In International Conference on Machine Learning, pages 4596–4604. PMLR.
  42. Masked measurement prediction: Learning to jointly predict quantities and units from textual context. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 17–29, Seattle, United States. Association for Computational Linguistics.
  43. A causal framework to quantify the robustness of mathematical reasoning with language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 545–561, Toronto, Canada. Association for Computational Linguistics.
  44. Roformer: Enhanced transformer with rotary position embedding.
  45. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  46. Attention is all you need. Advances in neural information processing systems, 30.
  47. Investigating gender bias in language models using causal mediation analysis. Advances in Neural Information Processing Systems, 33:12388–12401.
  48. Visualizing weights. Distill, 6(2):e00024–007.
  49. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 billion parameter autoregressive language model. https://github.com/kingoflolz/mesh-transformer-jax.
  50. Interpretability in the wild: A circuit for indirect object identification in GPT-2 small. In NeurIPS ML Safety Workshop.
  51. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
  52. Chain of thought prompting elicits reasoning in large language models. CoRR, abs/2201.11903.
  53. Large language models as optimizers. arXiv preprint arXiv:2309.03409.
Citations (32)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com