Crafting Interpretable Embeddings by Asking LLMs Questions (2405.16714v1)
Abstract: LLMs have rapidly improved text embeddings for a growing array of natural-language processing tasks. However, their opaqueness and proliferation into scientific domains such as neuroscience have created a growing need for interpretability. Here, we ask whether we can obtain interpretable embeddings through LLM prompting. We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM. Training QA-Emb reduces to selecting a set of underlying questions rather than learning model weights. We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli. QA-Emb significantly outperforms an established interpretable baseline, and does so while requiring very few questions. This paves the way towards building flexible feature spaces that can concretize and evaluate our understanding of semantic brain representations. We additionally find that QA-Emb can be effectively approximated with an efficient model, and we explore broader applications in simple NLP tasks.
- The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389, 2009.
- Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
- Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, New York, NY, USA, 2020. Association for Computing Machinery.
- Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821, 2021.
- Niklas Muennighoff. Sgpt: Gpt sentence embeddings for semantic search. arXiv preprint arXiv:2202.08904, 2022.
- Improving text embeddings with large language models. arXiv preprint arXiv:2401.00368, 2023.
- Towards general text embeddings with multi-stage contrastive learning. arXiv preprint arXiv:2308.03281, 2023.
- Computational language modeling and the promise of in silico experimentation. Neurobiology of Language, 5(1):80–106, 2024.
- Can large language models transform computational social science? arXiv preprint arXiv:2305.03514, 2023.
- Learning conceptual-contextual embeddings for medical text. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 9579–9586, 2020.
- European union regulations on algorithmic decision-making and a" right to explanation". arXiv preprint arXiv:1606.08813, 2016.
- Concrete problems in ai safety. arXiv preprint arXiv:1606.06565, 2016.
- Iason Gabriel. Artificial intelligence, values, and alignment. Minds and machines, 30(3):411–437, 2020.
- Rethinking interpretability in the era of large language models. arXiv preprint arXiv:2402.01761, 2024.
- Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910, 2022.
- Explaining patterns in data with language models via interpretable autoprompting. arXiv preprint arXiv:2210.01848, 2022.
- Language models can explain neurons in language models. https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html, 2023.
- Explaining black box text modules in natural language with language models. arXiv preprint arXiv:2305.09863, 2023.
- Sanity checks for saliency maps. In Advances in Neural Information Processing Systems, pages 9505–9515, 2018.
- A roadmap for a rigorous science of interpretability. arXiv preprint arXiv:1702.08608, 2017.
- Tal Yarkoni. The generalizability crisis. Behavioral and Brain Sciences, 45:e1, 2022.
- Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600):453–458, 2016.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- OpenAI. GPT-4 technical report. arXiv:2303.08774, 2023.
- Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320, 2005.
- Mistral 7b, 2023.
- AI@Meta. Llama 3 model card. 2024.
- Ultra-high dimensional sparse representations with binarization for efficient text retrieval. arXiv preprint arXiv:2104.07198, 2021.
- Matryoshka representation learning. Advances in Neural Information Processing Systems, 35:30233–30249, 2022.
- Repetition improves language model embeddings. arXiv preprint arXiv:2402.15449, 2024.
- Gecko: Versatile text embeddings distilled from large language models, 2024.
- Promptreps: Prompting large language models to generate dense and sparse representations for zero-shot document retrieval, 2024.
- Tree prompting: Efficient task adaptation without fine-tuning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6253–6267, 2023.
- Interpretable-by-design text classification with iteratively generated concept bottleneck. arXiv preprint arXiv:2310.19660, 2023.
- Answer is all you need: Instruction-following text embedding via answering the question. arXiv preprint arXiv:2402.09642, 2024.
- Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
- Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
- Augmenting interpretable models with large language models during training. Nature Communications, 14(1):7913, 2023.
- What you can cram into a single vector: Probing sentence embeddings for linguistic properties. arXiv preprint arXiv:1805.01070, 2018.
- Incorporating priors with feature attribution on text classification. arXiv preprint arXiv:1906.08286, 2019.
- Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6541–6549, 2017.
- Understanding the role of individual units in a deep neural network. Proceedings of the National Academy of Sciences, 117(48):30071–30078, 2020.
- Finding neurons in a haystack: Case studies with sparse probing, 2023.
- Describe-and-dissect: Interpreting neurons in vision networks with language models. arXiv preprint arXiv:2403.13771, 2024.
- Toward a visual concept vocabulary for gan latent space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6804–6812, 2021.
- Explaining language models’ predictions with high-impact concepts. arXiv preprint arXiv:2305.02160, 2023.
- Uncovering meanings of embeddings via partial orthogonality. Advances in Neural Information Processing Systems, 36, 2024.
- Clip-dissect: Automatic description of neuron representations in deep vision networks. arXiv preprint arXiv:2204.10965, 2022.
- Label-free concept bottleneck models. arXiv preprint arXiv:2304.06129, 2023.
- Interpreting clip with sparse linear concept embeddings (splice). arXiv preprint arXiv:2402.10376, 2024.
- Language in a bottle: Language model guided concept bottlenecks for interpretable image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19187–19197, 2023.
- Concept bottleneck generative models. In The Twelfth International Conference on Learning Representations, 2023.
- Sim2word: Explaining similarity with representative attribute words via counterfactual explanations. ACM Trans. Multimedia Comput. Commun. Appl., 19(6), jul 2023.
- Analogies and feature attributions for model agnostic explanation of similarity learners. arXiv preprint arXiv:2202.01153, 2022.
- The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45):e2105646118, 2021.
- Scaling laws for language encoding models in fmri. Advances in Neural Information Processing Systems, 36, 2024.
- Incorporating context into language encoding models for fmri. Advances in neural information processing systems, 31, 2018.
- Aligning context-based statistical models of language with brain activity during reading. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 233–243, Doha, Qatar, October 2014. Association for Computational Linguistics.
- Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain). In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25(3):369–380, March 2022. Number: 3 Publisher: Nature Publishing Group.
- Neurocomputational models of language processing. Annual Review of Linguistics, 8(1):427–446, 2022.
- Computational Language Modeling and the Promise of in Silico Experimentation. Neurobiology of Language, pages 1–27, March 2023.
- Complete functional characterization of sensory neurons by system identification. Annual Review of Neuroscience, 29:477–505, 2006.
- Interpretable multi-timescale models for predicting fmri responses to continuous natural speech. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 13738–13749. Curran Associates, Inc., 2020.
- The cortical representation of language timescales is shared between reading and listening. bioRxiv, pages 2023–01, 2023.
- Visual and linguistic semantic representations are aligned at the border of human visual cortex. Nature neuroscience, 24(11):1628–1636, 2021.
- Disentangling syntax and semantics in the brain with deep networks. In Proceedings of the 38th International Conference on Machine Learning, pages 1336–1348. PMLR, July 2021. ISSN: 2640-3498.
- Lexical semantic content, not syntactic structure, is the main contributor to ann-brain similarity of fmri responses in the language network. bioRxiv, pages 2023–05, 2023.
- Can fMRI reveal the representation of syntactic structure in the brain? preprint, Neuroscience, June 2020.
- Information-Restricted Neural Language Models Reveal Different Brain Regions’ Sensitivity to Semantics, Syntax and Context, February 2023. arXiv:2302.14389 [cs].
- Training language models for deeper understanding improves brain alignment, December 2022. arXiv:2212.10898 [cs, q-bio].
- Reconstructing the cascade of language processing in the brain using the internal computations of a transformer-based language model. Technical report, bioRxiv, June 2022. Section: New Results Type: article.
- Joint processing of linguistic properties in brains and language models, December 2022. arXiv:2212.08094 [cs, q-bio].
- A natural language fmri dataset for voxelwise encoding models. bioRxiv, pages 2022–09, 2022.
- Semantic reconstruction of continuous language from non-invasive brain recordings. Nature Neuroscience, pages 1–9, 2023.
- Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12(Oct):2825–2830, 2011.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- The language network as a natural kind within the broader landscape of the human brain. Nature Reviews Neuroscience, pages 1–24, 2024.
- The occipital place area is causally involved in representing environmental boundaries during navigation. Current Biology, 26(8):1104–1109, 2016.
- Retrosplenial cortex and its role in spatial cognition. Brain and neuroscience advances, 2:2398212818757098, 2018.
- The lateral intraparietal sulcus takes viewpoint changes into account during memory-guided attention in natural scenes. Brain Structure and Function, 226(4):989–1006, 2021.
- An investigation across 45 languages and 12 language families reveals a universal language network. Nature Neuroscience, 25(8):1014–1019, 2022.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Adapting language models for zero-shot learning by meta-tuning on dataset and prompt collections. arXiv preprint arXiv:2104.04670, 2021.
- Describing differences between text distributions with natural language. In International Conference on Machine Learning, pages 27099–27116. PMLR, 2022.
- Ms marco: A human-generated machine reading comprehension dataset. 2016.
- Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65, 2014.
- CARER: Contextualized affect representations for emotion recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3687–3697, Brussels, Belgium, October-November 2018. Association for Computational Linguistics.
- Character-level convolutional networks for text classification. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
- Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics.
- Anton Korinek. Language models and cognitive automation for economic research. Technical report, National Bureau of Economic Research, 2023.
- Raptor: Recursive abstractive processing for tree-organized retrieval. arXiv preprint arXiv:2401.18059, 2024.
- From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130, 2024.
- Goal driven discovery of distributional differences via language descriptions. arXiv preprint arXiv:2302.14233, 2023.
- Large language models as optimizers. arXiv preprint arXiv:2309.03409, 2023.
- Promptagent: Strategic planning with language models enables expert-level prompt optimization. arXiv preprint arXiv:2310.16427, 2023.
- Self-rewarding language models. arXiv preprint arXiv:2401.10020, 2024.
- Deductive closure training of language models for coherence, accuracy, and updatability. arXiv preprint arXiv:2401.08574, 2024.
- Towards consistent natural-language explanations via explanation-consistency finetuning. arXiv preprint arXiv:2401.13986, 2024.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Vinamra Benara (4 papers)
- Chandan Singh (42 papers)
- John X. Morris (24 papers)
- Richard Antonello (8 papers)
- Ion Stoica (177 papers)
- Alexander G. Huth (11 papers)
- Jianfeng Gao (344 papers)