Emergent Mind

Language Models Implement Simple Word2Vec-style Vector Arithmetic

(2305.16130)
Published May 25, 2023 in cs.CL and cs.LG

Abstract

A primary criticism towards language models (LMs) is their inscrutability. This paper presents evidence that, despite their size and complexity, LMs sometimes exploit a simple vector arithmetic style mechanism to solve some relational tasks using regularities encoded in the hidden space of the model (e.g., Poland:Warsaw::China:Beijing). We investigate a range of language model sizes (from 124M parameters to 176B parameters) in an in-context learning setting, and find that for a variety of tasks (involving capital cities, uppercasing, and past-tensing) a key part of the mechanism reduces to a simple additive update typically applied by the feedforward (FFN) networks. We further show that this mechanism is specific to tasks that require retrieval from pretraining memory, rather than retrieval from local context. Our results contribute to a growing body of work on the interpretability of LMs, and offer reason to be optimistic that, despite the massive and non-linear nature of the models, the strategies they ultimately use to solve tasks can sometimes reduce to familiar and even intuitive algorithms.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Sign up for a free account or log in to generate a summary of this paper:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale
  2. Inducing Relational Knowledge from BERT
  3. Discovering Latent Knowledge in Language Models Without Supervision
  4. Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.  1860–1874, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.146. https://aclanthology.org/2021.acl-long.146.

  5. Analyzing Commonsense Emergence in Few-shot Knowledge Models. September 2021. https://openreview.net/forum?id=StHCELh9PVE.

  6. Knowledge neurons in pretrained transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  8493–8502
  7. Discovering Latent Concepts Learned in BERT
  8. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. https://aclanthology.org/N19-1423.

  9. Static Embeddings as Efficient Knowledge Bases? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  2353–2363, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.186. https://aclanthology.org/2021.naacl-main.186.

  10. A mathematical framework for transformer circuits. Transformer Circuits Thread
  11. Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  5484–5495, 2021a.
  12. Transformer Feed-Forward Layers Are Key-Value Memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  5484–5495, Online and Punta Cana, Dominican Republic, November 2021b. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.446. https://aclanthology.org/2021.emnlp-main.446.

  13. Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
  14. Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
  15. Dissecting recall of factual associations in auto-regressive language models
  16. When can models learn from explanations? a formal framework for understanding the roles of explanation data. In Proceedings of the First Workshop on Learning with Natural Language Supervision, pp.  29–39, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.lnls-1.4. https://aclanthology.org/2022.lnls-1.4.

  17. Editing models with task arithmetic. ICLR
  18. Residual connections encourage iterative inference. In International Conference on Learning Representations
  19. Attention is not only a weight: Analyzing transformers with vector norms. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  7057–7075, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.574. https://aclanthology.org/2020.emnlp-main.574.

  20. Towards falsifiable interpretability research
  21. Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
  22. Locating and Editing Factual Associations in GPT
  23. Locating and editing factual associations in gpt. In Advances in Neural Information Processing Systems, 2022b.
  24. Mass-Editing Memory in a Transformer
  25. nostalgebraist. interpreting GPT: the logit lens. 2020. https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens.

  26. In-context learning and induction heads. Transformer Circuits Thread, 2022. https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html.

  27. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp.  2227–2237, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1202. https://aclanthology.org/N18-1202.

  28. Language Models as Knowledge Bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  2463–2473, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1250. https://aclanthology.org/D19-1250.

  29. Language models are unsupervised multitask learners
  30. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
  31. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
  32. Bert rediscovers the classical nlp pipeline. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.  4593–4601
  33. Attention is all you need. Advances in neural information processing systems, 30
  34. The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  4396–4406, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1448. https://aclanthology.org/D19-1448.

  35. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, May 2021.

  36. Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
  37. Interpretability in the wild: a circuit for indirect object identification in gpt-2 small. In NeurIPS ML Safety Workshop.

Show All 37