Function Vectors in Large Language Models (2310.15213v2)
Abstract: We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer LMs. Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV). FVs are robust to changes in context, i.e., they trigger execution of the task on inputs such as zero-shot and natural text settings that do not resemble the ICL contexts from which they are collected. We test FVs across a range of tasks, models, and layers and find strong causal effects across settings in middle layers. We investigate the internal structure of FVs and find while that they often contain information that encodes the output space of the function, this information alone is not sufficient to reconstruct an FV. Finally, we test semantic vector composition in FVs, and find that to some extent they can be summed to create vectors that trigger new complex tasks. Our findings show that compact, causal internal vector representations of function abstractions can be explicitly extracted from LLMs. Our code and data are available at https://functions.baulab.info.
- What learning algorithm is in-context learning? investigations with linear models. arXiv preprint arXiv:2211.15661, 2022.
- Eliciting latent predictions from transformers with the tuned lens. arXiv preprint arXiv:2303.08112, 2023.
- GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of the ACL Workshop on Challenges & Perspectives in Creating Large Language Models, 2022. URL https://arxiv.org/abs/2204.06745.
- Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
- Alonzo Church. An unsolvable problem of elementary number theory. American journal of mathematics, 58:345–373, 1936.
- Word translation without parallel data. arXiv preprint arXiv:1710.04087, 2017.
- Why can gpt learn in-context? language models secretly perform gradient descent as meta optimizers. arXiv preprint arXiv:2212.10559, 2022.
- Analyzing transformers in embedding space. arXiv preprint arXiv:2209.02535, 2022.
- A mathematical framework for transformer circuits. Transformer Circuits Thread, 2021.
- What can transformers learn in-context? a case study of simple function classes. Advances in Neural Information Processing Systems, 35:30583–30598, 2022.
- Transformer feed-forward layers are key-value memories. arXiv preprint arXiv:2012.14913, 2020.
- Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. arXiv preprint arXiv:2203.14680, 2022.
- Dissecting recall of factual associations in auto-regressive language models. arXiv preprint arXiv:2304.14767, 2023.
- A theory of emergent in-context learning as implicit structure induction. arXiv preprint arXiv:2303.07971, 2023.
- Overthinking the truth: Understanding how language models process false demonstrations. arXiv preprint arXiv:2307.09476, 2023.
- In-context learning of large language models explained as kernel regression. arXiv preprint arXiv:2305.12766, 2023.
- Linearity of relation decoding in transformer language models. arXiv preprint arXiv:2308.09124, 2023.
- Instruction induction: From few examples to natural language task descriptions, 2022.
- Editing models with task arithmetic. In The Eleventh International Conference on Learning Representations, 2023.
- Ground-truth labels matter: A deeper look into input-label demonstrations. arXiv preprint arXiv:2205.12685, 2022.
- Dependency-based word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 302–308, 2014.
- Transformers as algorithms: Generalization and stability in in-context learning. Proceedings of the 40th International Conference on Machine Learning, 2023.
- Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372, 2022a.
- Mass-editing memory in a transformer. arXiv preprint arXiv:2210.07229, 2022b.
- Language models implement simple word2vec-style vector arithmetic. arXiv preprint arXiv:2305.16130, 2023.
- Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26, 2013.
- Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837, 2022.
- Learning to compress prompts with gist tokens. arXiv preprint arXiv:2304.08467, 2023.
- Distinguishing antonyms and synonyms in a pattern-based neural network. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 76–85, Valencia, Spain, April 2017. Association for Computational Linguistics. URL https://aclanthology.org/E17-1008.
- Nostalgebraist. Interpreting GPT: The logit lens. URL https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens, 2020.
- In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022.
- OpenAI. Gpt-4 technical report, 2023.
- What in-context learning” learns” in-context: Disentangling task recognition and task learning. arXiv preprint arXiv:2305.09731, 2023.
- Task-specific skill localization in fine-tuned language models. arXiv preprint arXiv:2302.06600, 2023.
- Judea Pearl. Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainty and Artificial Intelligence, 2001, pp. 411–420. Morgan Kaufman, 2001.
- Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–7, 2021.
- Erik Tjong Kim Sang and Fien De Meulder. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pp. 142–147, 2003.
- Compositional task representations for large language models. In The Eleventh International Conference on Learning Representations, 2023.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/D13-1170.
- Gerald Jay Sussman. Scheme: an interpreter for extended lambda calculus. MIT AI Memo, 1975.
- Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937, 2018.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Investigating gender bias in language models using causal mediation analysis. Advances in neural information processing systems, 33:12388–12401, 2020.
- Transformers learn in-context by gradient descent. arXiv preprint arXiv:2212.07677, 2022.
- GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, May 2021.
- Interpretability in the wild: a circuit for indirect object identification in gpt-2 small. arXiv preprint arXiv:2211.00593, 2022a.
- Label words are anchors: An information flow perspective for understanding in-context learning. arXiv preprint arXiv:2305.14160, 2023a.
- Finding skill neurons in pre-trained transformer-based language models. arXiv preprint arXiv:2211.07349, 2022b.
- Investigating the learning behaviour of in-context learning: A comparison with supervised learning. arXiv preprint arXiv:2307.15411, 2023b.
- Large language models are implicitly topic models: Explaining and finding good demonstrations for in-context learning. arXiv preprint arXiv:2301.11916, 2023c.
- Larger language models do in-context learning differently. arXiv preprint arXiv:2303.03846, 2023.
- The learnability of in-context learning. arXiv preprint arXiv:2303.07895, 2023.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45, Online, October 2020. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.emnlp-demos.6.
- An explanation of in-context learning as implicit bayesian inference. arXiv preprint arXiv:2111.02080, 2021.
- Character-level convolutional networks for text classification. Advances in neural information processing systems, 28, 2015.
- What and how does in-context learning learn? bayesian model averaging, parameterization, and generalization. arXiv preprint arXiv:2305.19420, 2023.
Collections
Sign up for free to add this paper to one or more collections.