Interpretable Language Modeling via Induction-head Ngram Models (2411.00066v1)
Abstract: Recent LLMs have excelled across a wide range of tasks, but their use in high-stakes and compute-limited settings has intensified the demand for interpretability and efficiency. We address this need by proposing Induction-head ngram models (Induction-Gram), a method that builds an efficient, interpretable LM by bolstering modern ngram models with a hand-engineered "induction head". This induction head uses a custom neural similarity metric to efficiently search the model's input context for potential next-word completions. This process enables Induction-Gram to provide ngram-level grounding for each generated token. Moreover, experiments show that this simple method significantly improves next-word prediction over baseline interpretable models (up to 26%p) and can be used to speed up LLM inference for large models through speculative decoding. We further study Induction-Gram in a natural-language neuroscience setting, where the goal is to predict the next fMRI response in a sequence. It again provides a significant improvement over interpretable models (20% relative increase in the correlation of predicted fMRI responses), potentially enabling deeper scientific investigation of language selectivity in the brain. The code is available at https://github.com/ejkim47/induction-gram.
- In-context language learning: Arhitectures and algorithms, 2024.
- Mining source code repositories at massive scale using language modeling. 2013 10th Working Conference on Mining Software Repositories (MSR), pp. 207–216, 2013. URL https://api.semanticscholar.org/CorpusID:1857729.
- Repetitive reading and recall of expository text. Reading Research Quarterly, pp. 49–58, 1986.
- A generative framework to bridge data-driven models and scientific theories in language neuroscience, 2024a. URL https://arxiv.org/abs/2410.00812.
- Scaling laws for language encoding models in fmri. Advances in Neural Information Processing Systems, 36, 2024b.
- Alan Baddeley. Working memory. Science, 255(5044):556–559, 1992.
- Crafting interpretable embeddings by asking llms questions. arXiv preprint arXiv:2405.16714, 2024.
- Language models can explain neurons in language models, 2023. URL https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html.
- Science in the age of large language models. Nature Reviews Physics, pp. 1–4, 2023.
- Ecosystem graphs: The social footprint of foundation models. arXiv preprint arXiv:2303.15772, 2023.
- Improving language models by retrieving from trillions of tokens. In icml, 2022.
- Large language models in machine translation. In Conference on Empirical Methods in Natural Language Processing, 2007. URL https://api.semanticscholar.org/CorpusID:633992.
- Towards monosemanticity: Decomposing language models with dictionary learning. Transformer Circuits Thread, 2023. https://transformer-circuits.pub/2023/monosemantic-features/index.html.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Disentangling syntax and semantics in the brain with deep networks. In Proceedings of the 38th International Conference on Machine Learning, pp. 1336–1348. PMLR, July 2021. URL https://proceedings.mlr.press/v139/caucheteux21a.html. ISSN: 2640-3498.
- The cortical representation of language timescales is shared between reading and listening. bioRxiv, pp. 2023–01, 2023a.
- Accelerating large language model decoding with speculative sampling. arXiv preprint arXiv:2302.01318, 2023b.
- The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
- Describing differences in image sets with natural language. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24199–24208, 2024.
- Not all language model features are linear. arXiv preprint arXiv:2405.14860, 2024.
- Bayesian concept bottleneck models with llm priors. arXiv preprint arXiv:2410.15555, 2024.
- Bruce Fischl. Freesurfer. Neuroimage, 62(2):774–781, 2012.
- The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
- Great memory, shallow reasoning: Limits of k𝑘kitalic_k nn-lms. arXiv preprint arXiv:2408.11815, 2024.
- Openwebtext corpus. http://Skylion007.github.io/OpenWebTextCorpus, 2019.
- Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25(3):369–380, March 2022. ISSN 1546-1726. doi: 10.1038/s41593-022-01026-4. URL https://www.nature.com/articles/s41593-022-01026-4. Number: 3 Publisher: Nature Publishing Group.
- Adaptive wavelet distillation from neural networks through interpretations. Advances in Neural Information Processing Systems, 34:20669–20682, 2021.
- Rest: Retrieval-based speculative decoding. arXiv preprint arXiv:2311.08252, 2023.
- Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600):453–458, 2016.
- Incorporating context into language encoding models for fmri. Advances in neural information processing systems, 31, 2018.
- Interpretable multi-timescale models for predicting fmri responses to continuous natural speech. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 13738–13749. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/9e9a30b74c49d07d8150c8c83b1ccf07-Paper.pdf.
- Speech and language processing - an introduction to natural language processing, computational linguistics, and speech recognition. In Prentice Hall series in artificial intelligence, 2000. URL https://api.semanticscholar.org/CorpusID:60691216.
- Slava Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE transactions on acoustics, speech, and signal processing, 35(3):400–401, 1987.
- Lexical semantic content, not syntactic structure, is the main contributor to ann-brain similarity of fmri responses in the language network. bioRxiv, pp. 2023–05, 2023.
- Suffix trees as language models. In International Conference on Language Resources and Evaluation, 2012. URL https://api.semanticscholar.org/CorpusID:12071964.
- Generalization through memorization: Nearest neighbor language models. In iclr, 2020.
- Reconstructing the cascade of language processing in the brain using the internal computations of a transformer-based language model. Technical report, bioRxiv, June 2022. URL https://www.biorxiv.org/content/10.1101/2022.06.08.495348v1. Section: New Results Type: article.
- A natural language fmri dataset for voxelwise encoding models. bioRxiv, pp. 2022–09, 2022.
- Fast inference from transformers via speculative decoding. In International Conference on Machine Learning, pp. 19274–19286. PMLR, 2023.
- Neural bag-of-ngrams. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
- Residual learning of neural text generation with n-gram language model. In Findings of the Association for Computational Linguistics: EMNLP 2022, 2022. URL https://aclanthology.org/2022.findings-emnlp.109.
- A survey on fairness in large language models. arXiv preprint arXiv:2308.10149, 2023.
- Infini-gram: Scaling unbounded n-gram language models to a trillion tokens. arXiv preprint arXiv:2401.17377, 2024.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7.
- The imperative for regulatory oversight of large language models (or generative ai) in healthcare. NPJ digital medicine, 6(1):120, 2023.
- One neuron versus deep learning in aftershock prediction. Nature, 574(7776):E1–E3, 2019.
- Verbatim and gist recall of sentences by dyslexic and non-dyslexic adults. Dyslexia, 12(3):177–194, 2006.
- Tree prompting: efficient task adaptation without fine-tuning. arXiv preprint arXiv:2310.14034, 2023.
- Brain computer interfaces, a review. sensors, 12(2):1211–1279, 2012.
- Eye movement-invariant representations in the human visual system. Journal of vision, 17(1):11–11, 2017.
- In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022.
- Joint processing of linguistic properties in brains and language models, December 2022. URL http://arxiv.org/abs/2212.08094. arXiv:2212.08094 [cs, q-bio].
- OpenAI. GPT-4 technical report, 2023.
- The fineweb datasets: Decanting the web for the finest text data at scale. arXiv preprint arXiv:2406.17557, 2024.
- Assessing episodic memory in llms with sequence order recall tasks. arXiv preprint arXiv:2410.08133, 2024.
- A practical review of mechanistic interpretability for transformer-based language models. arXiv preprint arXiv:2407.02646, 2024.
- Can fMRI reveal the representation of syntactic structure in the brain? preprint, Neuroscience, June 2020. URL http://biorxiv.org/lookup/doi/10.1101/2020.06.16.155499.
- Interpretable machine learning: Fundamental principles and 10 grand challenges. arXiv preprint arXiv:2103.11251, 2021.
- The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45):e2105646118, 2021.
- Compact, efficient and unlimited capacity: Language modeling with compressed suffix trees. In Conference on Empirical Methods in Natural Language Processing, 2015. URL https://api.semanticscholar.org/CorpusID:225428.
- Self-attention with relative position representations. arXiv preprint arXiv:1803.02155, 2018.
- Augmenting interpretable models with large language models during training. Nature Communications, 14(1):7913, 2023a.
- Explaining black box text modules in natural language with language models. arXiv preprint arXiv:2305.09863, 2023b.
- Rethinking interpretability in the era of large language models. arXiv preprint arXiv:2402.01761, 2024.
- Herman Stehouwer and Menno van Zaanen. Using suffix arrays as language models: Scaling the n-gram. 2010. URL https://api.semanticscholar.org/CorpusID:18379946.
- Crafting large language models for enhanced interpretability. arXiv preprint arXiv:2407.04307, 2024.
- Semantic reconstruction of continuous language from non-invasive brain recordings. Nature Neuroscience, pp. 1–9, 2023.
- Large language models in medicine. Nature medicine, 29(8):1930–1940, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Driving and suppressing the human language network using large language models. bioRxiv, 2023.
- Ovid JL Tzeng. Positive recency effect in a delayed free recall. Journal of Verbal Learning and Verbal Behavior, 12(4):436–439, 1973.
- Humans and language models diverge when predicting repeating text. arXiv preprint arXiv:2310.06408, 2023.
- Interpretability in the wild: a circuit for indirect object identification in GPT-2 small. arXiv preprint arXiv:2211.00593, 2022.
- Aligning context-based statistical models of language with brain activity during reading. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 233–243, Doha, Qatar, October 2014. Association for Computational Linguistics. doi: 10.3115/v1/D14-1030. URL https://aclanthology.org/D14-1030.
- Complete functional characterization of sensory neurons by system identification. Annual Review of Neuroscience, 29:477–505, 2006. ISSN 0147-006X. doi: 10.1146/annurev.neuro.29.051605.113024.
- Retrieval head mechanistically explains long-context factuality, 2024.
- Language in a bottle: Language model guided concept bottlenecks for interpretable image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19187–19197, 2023a.
- Large language models for automated open-domain scientific hypotheses discovery. arXiv preprint arXiv:2309.02726, 2023b.
- Tell your model where to attend: Post-hoc attention steering for LLMs. arXiv preprint arXiv:2311.02262, 2023.
- Describing differences between text distributions with natural language. In International Conference on Machine Learning, pp. 27099–27116. PMLR, 2022.