Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Crafting Interpretable Embeddings by Asking LLMs Questions (2405.16714v1)

Published 26 May 2024 in cs.CL, cs.AI, cs.LG, and q-bio.NC

Abstract: LLMs have rapidly improved text embeddings for a growing array of natural-language processing tasks. However, their opaqueness and proliferation into scientific domains such as neuroscience have created a growing need for interpretability. Here, we ask whether we can obtain interpretable embeddings through LLM prompting. We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM. Training QA-Emb reduces to selecting a set of underlying questions rather than learning model weights. We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli. QA-Emb significantly outperforms an established interpretable baseline, and does so while requiring very few questions. This paves the way towards building flexible feature spaces that can concretize and evaluate our understanding of semantic brain representations. We additionally find that QA-Emb can be effectively approximated with an efficient model, and we explore broader applications in simple NLP tasks.

Crafting Interpretable Embeddings by Asking LLMs Questions

The paper presents a novel technique to generate interpretable text embeddings, designated as Question-Answering Embeddings (QA-Emb). Authors Vinamra Benara, Chandan Singh, John X. Morris, and Richard Antonello spearhead the research from prominent institutions like UC Berkeley and Microsoft Research, focusing on the intersection of machine learning, NLP, and neuroscience.

Problem Statement

Traditional methods for generating text embeddings, such as bag-of-words or transformer-based embeddings (e.g., BERT, LLaMA), often produce opaque representations, complicating their interpretability. This opaqueness poses significant challenges in domains that demand trustworthy interpretation, such as neuroscience. The authors propose QA-Emb to bridge this gap by rendering each dimension of the embedding human-interpretable through a series of yes/no questions administered to a pre-trained autoregressive LLM.

Methodology

QA-Emb involves querying an LLM with a set of yes/no questions related to the input text. Each question’s binary answer (mapped to 0 or 1) forms a specific dimension of the resulting embedding. Notably, this method does not require fine-tuning the LLM or altering its internal parameters but rather relies on carefully crafted natural language prompts.

Learning the Set of Questions

The selection of yes/no questions is optimized to suit the downstream task. In the case of predicting fMRI responses, the authors formulate the learning problem as an optimization task for ridge regression. The questions are heuristically generated via prompts to capable LLMs like GPT-4 and are fine-tuned by methods such as Elastic Net for redundancy reduction.

Neuroscience Application

Focusing on a neuroscience application, the authors employ QA-Emb to predict human brain responses (measured through fMRI) to natural language stimuli. The paper uses data from narrative podcast stories heard by subjects, with the embedding inputs used in ridge regression models to predict fMRI responses. The results showcase a 26% improvement over the existing interpretable baseline (Eng1000) and competitive performance compared to black-box models like BERT and LLaMA.

Numerical Results

Key findings include:

  • QA-Emb outperforms Eng1000 by 26% in terms of average test correlation.
  • Even with only 29 questions, QA-Emb achieves superior interpretability and performance compared to Eng1000 which used a larger set of features.
  • QA-Emb achieves a 0.116 average test correlation, slightly better than BERT but 7% lower than the best-performing LLaMA model.

Limitations and Optimizations

Two primary limitations cited are the high computational cost and potential inaccuracies in the LLM's answers to the yes/no questions:

  1. Computational Cost: QA-Emb requires numerous LLM calls, rendering it computationally intensive. To alleviate this, the authors explore model distillation, whereby a RoBERTa model predicts multiple questions' answers in a single feedforward pass, yielding nearly equivalent performance with significantly reduced computational overhead.
  2. LLM Accuracy: The reliability of QA-Emb depends on the LLM’s ability to faithfully answer the yes/no questions. Variability in LLM performance on diverse binary classification tasks underscores the necessity for strong LLMs and optimized prompt engineering.

Broader Applications and Future Work

QA-Emb demonstrates potential applications beyond neuroscience, including information retrieval and text clustering, where it provides modest improvements and a high degree of interpretability. The paper outlines several avenues for future research:

  • Enhanced optimization techniques for selecting questions.
  • A broader range of applications in domains requiring interpretable text embeddings.
  • Improved discrete optimization methods and constraints for more direct optimization of QA-Emb.

Moreover, the authors highlight the societal benefits of interpretable AI systems and the importance of transparency in AI applications, especially in high-stakes fields such as medicine and social sciences.

Conclusion

In summary, QA-Emb introduces a promising method for generating interpretable text embeddings by leveraging the capabilities of LLMs through strategic questioning. This innovation aligns high interpretability with robust performance, addressing a significant challenge in embedding techniques and opening new pathways for applications in various domains. As LLMs evolve, QA-Emb stands to benefit from increased efficiency and capability, further cementing its utility in NLP and beyond.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (101)
  1. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389, 2009.
  2. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
  3. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, New York, NY, USA, 2020. Association for Computing Machinery.
  4. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821, 2021.
  5. Niklas Muennighoff. Sgpt: Gpt sentence embeddings for semantic search. arXiv preprint arXiv:2202.08904, 2022.
  6. Improving text embeddings with large language models. arXiv preprint arXiv:2401.00368, 2023.
  7. Towards general text embeddings with multi-stage contrastive learning. arXiv preprint arXiv:2308.03281, 2023.
  8. Computational language modeling and the promise of in silico experimentation. Neurobiology of Language, 5(1):80–106, 2024.
  9. Can large language models transform computational social science? arXiv preprint arXiv:2305.03514, 2023.
  10. Learning conceptual-contextual embeddings for medical text. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 9579–9586, 2020.
  11. European union regulations on algorithmic decision-making and a" right to explanation". arXiv preprint arXiv:1606.08813, 2016.
  12. Concrete problems in ai safety. arXiv preprint arXiv:1606.06565, 2016.
  13. Iason Gabriel. Artificial intelligence, values, and alignment. Minds and machines, 30(3):411–437, 2020.
  14. Rethinking interpretability in the era of large language models. arXiv preprint arXiv:2402.01761, 2024.
  15. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910, 2022.
  16. Explaining patterns in data with language models via interpretable autoprompting. arXiv preprint arXiv:2210.01848, 2022.
  17. Language models can explain neurons in language models. https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html, 2023.
  18. Explaining black box text modules in natural language with language models. arXiv preprint arXiv:2305.09863, 2023.
  19. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems, pages 9505–9515, 2018.
  20. A roadmap for a rigorous science of interpretability. arXiv preprint arXiv:1702.08608, 2017.
  21. Tal Yarkoni. The generalizability crisis. Behavioral and Brain Sciences, 45:e1, 2022.
  22. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600):453–458, 2016.
  23. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  24. OpenAI. GPT-4 technical report. arXiv:2303.08774, 2023.
  25. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320, 2005.
  26. Mistral 7b, 2023.
  27. AI@Meta. Llama 3 model card. 2024.
  28. Ultra-high dimensional sparse representations with binarization for efficient text retrieval. arXiv preprint arXiv:2104.07198, 2021.
  29. Matryoshka representation learning. Advances in Neural Information Processing Systems, 35:30233–30249, 2022.
  30. Repetition improves language model embeddings. arXiv preprint arXiv:2402.15449, 2024.
  31. Gecko: Versatile text embeddings distilled from large language models, 2024.
  32. Promptreps: Prompting large language models to generate dense and sparse representations for zero-shot document retrieval, 2024.
  33. Tree prompting: Efficient task adaptation without fine-tuning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6253–6267, 2023.
  34. Interpretable-by-design text classification with iteratively generated concept bottleneck. arXiv preprint arXiv:2310.19660, 2023.
  35. Answer is all you need: Instruction-following text embedding via answering the question. arXiv preprint arXiv:2402.09642, 2024.
  36. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
  37. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
  38. Augmenting interpretable models with large language models during training. Nature Communications, 14(1):7913, 2023.
  39. What you can cram into a single vector: Probing sentence embeddings for linguistic properties. arXiv preprint arXiv:1805.01070, 2018.
  40. Incorporating priors with feature attribution on text classification. arXiv preprint arXiv:1906.08286, 2019.
  41. Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6541–6549, 2017.
  42. Understanding the role of individual units in a deep neural network. Proceedings of the National Academy of Sciences, 117(48):30071–30078, 2020.
  43. Finding neurons in a haystack: Case studies with sparse probing, 2023.
  44. Describe-and-dissect: Interpreting neurons in vision networks with language models. arXiv preprint arXiv:2403.13771, 2024.
  45. Toward a visual concept vocabulary for gan latent space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6804–6812, 2021.
  46. Explaining language models’ predictions with high-impact concepts. arXiv preprint arXiv:2305.02160, 2023.
  47. Uncovering meanings of embeddings via partial orthogonality. Advances in Neural Information Processing Systems, 36, 2024.
  48. Clip-dissect: Automatic description of neuron representations in deep vision networks. arXiv preprint arXiv:2204.10965, 2022.
  49. Label-free concept bottleneck models. arXiv preprint arXiv:2304.06129, 2023.
  50. Interpreting clip with sparse linear concept embeddings (splice). arXiv preprint arXiv:2402.10376, 2024.
  51. Language in a bottle: Language model guided concept bottlenecks for interpretable image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19187–19197, 2023.
  52. Concept bottleneck generative models. In The Twelfth International Conference on Learning Representations, 2023.
  53. Sim2word: Explaining similarity with representative attribute words via counterfactual explanations. ACM Trans. Multimedia Comput. Commun. Appl., 19(6), jul 2023.
  54. Analogies and feature attributions for model agnostic explanation of similarity learners. arXiv preprint arXiv:2202.01153, 2022.
  55. The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45):e2105646118, 2021.
  56. Scaling laws for language encoding models in fmri. Advances in Neural Information Processing Systems, 36, 2024.
  57. Incorporating context into language encoding models for fmri. Advances in neural information processing systems, 31, 2018.
  58. Aligning context-based statistical models of language with brain activity during reading. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 233–243, Doha, Qatar, October 2014. Association for Computational Linguistics.
  59. Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain). In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  60. Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25(3):369–380, March 2022. Number: 3 Publisher: Nature Publishing Group.
  61. Neurocomputational models of language processing. Annual Review of Linguistics, 8(1):427–446, 2022.
  62. Computational Language Modeling and the Promise of in Silico Experimentation. Neurobiology of Language, pages 1–27, March 2023.
  63. Complete functional characterization of sensory neurons by system identification. Annual Review of Neuroscience, 29:477–505, 2006.
  64. Interpretable multi-timescale models for predicting fmri responses to continuous natural speech. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 13738–13749. Curran Associates, Inc., 2020.
  65. The cortical representation of language timescales is shared between reading and listening. bioRxiv, pages 2023–01, 2023.
  66. Visual and linguistic semantic representations are aligned at the border of human visual cortex. Nature neuroscience, 24(11):1628–1636, 2021.
  67. Disentangling syntax and semantics in the brain with deep networks. In Proceedings of the 38th International Conference on Machine Learning, pages 1336–1348. PMLR, July 2021. ISSN: 2640-3498.
  68. Lexical semantic content, not syntactic structure, is the main contributor to ann-brain similarity of fmri responses in the language network. bioRxiv, pages 2023–05, 2023.
  69. Can fMRI reveal the representation of syntactic structure in the brain? preprint, Neuroscience, June 2020.
  70. Information-Restricted Neural Language Models Reveal Different Brain Regions’ Sensitivity to Semantics, Syntax and Context, February 2023. arXiv:2302.14389 [cs].
  71. Training language models for deeper understanding improves brain alignment, December 2022. arXiv:2212.10898 [cs, q-bio].
  72. Reconstructing the cascade of language processing in the brain using the internal computations of a transformer-based language model. Technical report, bioRxiv, June 2022. Section: New Results Type: article.
  73. Joint processing of linguistic properties in brains and language models, December 2022. arXiv:2212.08094 [cs, q-bio].
  74. A natural language fmri dataset for voxelwise encoding models. bioRxiv, pages 2022–09, 2022.
  75. Semantic reconstruction of continuous language from non-invasive brain recordings. Nature Neuroscience, pages 1–9, 2023.
  76. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12(Oct):2825–2830, 2011.
  77. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  78. The language network as a natural kind within the broader landscape of the human brain. Nature Reviews Neuroscience, pages 1–24, 2024.
  79. The occipital place area is causally involved in representing environmental boundaries during navigation. Current Biology, 26(8):1104–1109, 2016.
  80. Retrosplenial cortex and its role in spatial cognition. Brain and neuroscience advances, 2:2398212818757098, 2018.
  81. The lateral intraparietal sulcus takes viewpoint changes into account during memory-guided attention in natural scenes. Brain Structure and Function, 226(4):989–1006, 2021.
  82. An investigation across 45 languages and 12 language families reveals a universal language network. Nature Neuroscience, 25(8):1014–1019, 2022.
  83. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  84. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  85. Adapting language models for zero-shot learning by meta-tuning on dataset and prompt collections. arXiv preprint arXiv:2104.04670, 2021.
  86. Describing differences between text distributions with natural language. In International Conference on Machine Learning, pages 27099–27116. PMLR, 2022.
  87. Ms marco: A human-generated machine reading comprehension dataset. 2016.
  88. Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65, 2014.
  89. CARER: Contextualized affect representations for emotion recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3687–3697, Brussels, Belgium, October-November 2018. Association for Computational Linguistics.
  90. Character-level convolutional networks for text classification. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
  91. Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics.
  92. Anton Korinek. Language models and cognitive automation for economic research. Technical report, National Bureau of Economic Research, 2023.
  93. Raptor: Recursive abstractive processing for tree-organized retrieval. arXiv preprint arXiv:2401.18059, 2024.
  94. From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130, 2024.
  95. Goal driven discovery of distributional differences via language descriptions. arXiv preprint arXiv:2302.14233, 2023.
  96. Large language models as optimizers. arXiv preprint arXiv:2309.03409, 2023.
  97. Promptagent: Strategic planning with language models enables expert-level prompt optimization. arXiv preprint arXiv:2310.16427, 2023.
  98. Self-rewarding language models. arXiv preprint arXiv:2401.10020, 2024.
  99. Deductive closure training of language models for coherence, accuracy, and updatability. arXiv preprint arXiv:2401.08574, 2024.
  100. Towards consistent natural-language explanations via explanation-consistency finetuning. arXiv preprint arXiv:2401.13986, 2024.
  101. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Vinamra Benara (4 papers)
  2. Chandan Singh (42 papers)
  3. John X. Morris (24 papers)
  4. Richard Antonello (8 papers)
  5. Ion Stoica (177 papers)
  6. Alexander G. Huth (11 papers)
  7. Jianfeng Gao (344 papers)
Citations (4)
Youtube Logo Streamline Icon: https://streamlinehq.com