Locating and Editing Factual Associations in Mamba (2404.03646v2)
Abstract: We investigate the mechanisms of factual recall in the Mamba state space model. Our work is inspired by previous findings in autoregressive transformer LLMs suggesting that their knowledge recall is localized to particular modules at specific token locations; we therefore ask whether factual recall in Mamba can be similarly localized. To investigate this, we conduct four lines of experiments on Mamba. First, we apply causal tracing or interchange interventions to localize key components inside Mamba that are responsible for recalling facts, revealing that specific components within middle layers show strong causal effects at the last token of the subject, while the causal effect of intervening on later layers is most pronounced at the last token of the prompt, matching previous findings on autoregressive transformers. Second, we show that rank-one model editing methods can successfully insert facts at specific locations, again resembling findings on transformer LMs. Third, we examine the linearity of Mamba's representations of factual relations. Finally we adapt attention-knockout techniques to Mamba in order to dissect information flow during factual recall. We compare Mamba directly to a similar-sized autoregressive transformer LM and conclude that despite significant differences in architectural approach, when it comes to factual recall, the two architectures share many similarities.
- The hidden attention of mamba models. arXiv preprint arXiv:2403.01590, 2024.
- James A Anderson. A simple neural network generating an interactive memory. Mathematical biosciences, 14(3-4):197–220, 1972.
- Yonatan Belinkov. Probing classifiers: Promises, shortcomings, and advances. Computational Linguistics, 48(1):207–219, 2022.
- Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, 7:49–72, 2019.
- What do neural machine translation models learn about morphology? arXiv preprint arXiv:1704.03471, 2017.
- Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pp. 2397–2430. PMLR, 2023.
- On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014.
- What you can cram into a single vector: Probing sentence embeddings for linguistic properties. arXiv preprint arXiv:1805.01070, 2018.
- Time series analysis by state space methods, volume 38. OUP Oxford, 2012.
- Measuring and improving consistency in pretrained language models. Transactions of the Association for Computational Linguistics, 9:1012–1031, 2021.
- Probing for semantic evidence of composition by means of simple classification tasks. In Proceedings of the 1st workshop on evaluating vector-space representations for nlp, pp. 134–139, 2016.
- Inducing causal structure for interpretable neural networks. CoRR, abs/2112.00826, 2021. URL https://arxiv.org/abs/2112.00826.
- Dissecting recall of factual associations in auto-regressive language models. arXiv preprint arXiv:2304.14767, 2023.
- It’s raw! Audio generation with state-space models. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 7616–7633. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/goel22a.html.
- Is Mamba Capable of In-Context Learning?, 2024. URL http://arxiv.org/abs/2402.03170.
- Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
- Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021.
- Does localization inform editing? surprising differences in causality-based localization vs. knowledge editing in language models. Advances in Neural Information Processing Systems, 36, 2024.
- Linearity of relation decoding in transformer language models. arXiv preprint arXiv:2308.09124, 2023.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Visualisation and’diagnostic classifiers’ reveal how recurrent and recursive neural networks process hierarchical structure. Journal of Artificial Intelligence Research, 61:907–926, 2018.
- A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 33(2):494–514, 2021.
- Teuvo Kohonen. Correlation matrix memories. IEEE transactions on computers, 100(4):353–359, 1972.
- Statistical algorithms for models in state space using ssfpack 2.2. The Econometrics Journal, 2(1):107–160, 1999.
- Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372, 2022a.
- Mass-editing memory in a transformer. arXiv preprint arXiv:2210.07229, 2022b.
- Fact finding: Attempting to reverse-engineer factual recall on the neuron level, 2023. URL https://www.lesswrong.com/posts/iGuwZTHWb6DFY3sKB/fact-finding-attempting-to-reverse-engineer-factual-recall.
- S4nd: Modeling images and videos as multidimensional signals with state spaces. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 2846–2861. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/13388efc819c09564c66ab2dc8463809-Paper-Conference.pdf.
- Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution, 2023.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Judea Pearl. Direct and indirect effects. In Probabilistic and causal inference: the works of Judea Pearl, pp. 373–392. 2022.
- Does string-based neural mt learn source syntax? In Proceedings of the 2016 conference on empirical methods in natural language processing, pp. 1526–1534, 2016.
- Function vectors in large language models. arXiv preprint arXiv:2310.15213, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Investigating gender bias in language models using causal mediation analysis. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 12388–12401. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/92650b2e92217715fe312e6fa7b90d82-Paper.pdf.
- Interpretability in the wild: a circuit for indirect object identification in gpt-2 small. arXiv preprint arXiv:2211.00593, 2022.
- Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI conference on artificial intelligence, volume 28, 2014.
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019.
- Towards best practices of activation patching in language models: Metrics and methods. arXiv preprint arXiv:2309.16042, 2023.
- Arnab Sen Sharma (13 papers)
- David Atkinson (33 papers)
- David Bau (62 papers)