Understanding Ranking LLMs: A Mechanistic Analysis for Information Retrieval
Abstract: Transformer networks, particularly those achieving performance comparable to GPT models, are well known for their robust feature extraction abilities. However, the nature of these extracted features and their alignment with human-engineered ones remain unexplored. In this work, we investigate the internal mechanisms of state-of-the-art, fine-tuned LLMs for passage reranking. We employ a probing-based analysis to examine neuron activations in ranking LLMs, identifying the presence of known human-engineered and semantic features. Our study spans a broad range of feature categories, including lexical signals, document structure, query-document interactions, and complex semantic representations, to uncover underlying patterns influencing ranking decisions. Through experiments on four different ranking LLMs, we identify statistical IR features that are prominently encoded in LLM activations, as well as others that are notably missing. Furthermore, we analyze how these models respond to out-of-distribution queries and documents, revealing distinct generalization behaviors. By dissecting the latent representations within LLM activations, we aim to improve both the interpretability and effectiveness of ranking models. Our findings offer crucial insights for developing more transparent and reliable retrieval systems, and we release all necessary scripts and code to support further exploration.
- Explainable information retrieval: A survey. arXiv preprint arXiv:2211.02405 (2022).
- Gradient-based attribution methods. Explainable AI: Interpreting, explaining and visualizing deep learning (2019), 169–191.
- Understanding the role of individual units in a deep neural network. Proceedings of the National Academy of Sciences 117, 48 (2020), 30071–30078.
- Yonatan Belinkov. 2022. Probing classifiers: Promises, shortcomings, and advances. Computational Linguistics 48, 1 (2022), 207–219.
- Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1401–1410.
- Finding Inverse Document Frequency Information in BERT. arXiv preprint arXiv:2202.12191 (2022).
- Rank-lime: local model-agnostic feature attribution for learning to rank. In Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval. 33–37.
- RankSHAP: a Gold Standard Feature Attribution Method for the Ranking Task. arXiv preprint arXiv:2405.01848 (2024).
- What does bert look at? an analysis of bert’s attention. arXiv preprint arXiv:1906.04341 (2019).
- Are neural nets modular? inspecting functional modularity through differentiable weight masks. arXiv preprint arXiv:2010.02066 (2020).
- Bias in bios: A case study of semantic representation bias in a high-stakes setting. In proceedings of the Conference on Fairness, Accountability, and Transparency. 120–128.
- A mathematical framework for transformer circuits. Transformer Circuits Thread 1 (2021), 1.
- A linguistic study on relevance modeling in information retrieval. In Proceedings of the Web Conference 2021. 1053–1064.
- A study on the Interpretability of Neural Retrieval Models using DeepSHAP. In Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. 1005–1008.
- Deep neural model inspection and comparison via functional neuron pathways. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5754–5764.
- A white box analysis of ColBERT. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28–April 1, 2021, Proceedings, Part II 43. Springer, 257–263.
- Match your words! a study of lexical matching in neural information retrieval. In European Conference on Information Retrieval. Springer, 120–127.
- Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. arXiv preprint arXiv:2203.14680 (2022).
- Transformer feed-forward layers are key-value memories. arXiv preprint arXiv:2012.14913 (2020).
- Finding neurons in a haystack: Case studies with sparse probing. arXiv preprint arXiv:2305.01610 (2023).
- Xinzhi Han and Sen Lei. 2018. Feature selection and model comparison on microsoft learning-to-rank data sets. arXiv preprint arXiv:1803.05127 (2018).
- Efficient Feature Ranking and Selection using Statistical Moments. IEEE Access (2024).
- Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 39–48.
- HuggingFace H4 Stack Exchange Preference Dataset. https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences
- Implicit representations of meaning in neural language models. arXiv preprint arXiv:2106.00737 (2021).
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).
- Fine-tuning llama for multi-stage text retrieval. arXiv preprint arXiv:2310.08319 (2023).
- ABNIRML: Analyzing the behavior of neural IR models. Transactions of the Association for Computational Linguistics 10 (2022), 224–239.
- Is sparse attention more interpretable? arXiv preprint arXiv:2106.01087 (2021).
- Ms marco: A human-generated machine reading comprehension dataset. (2016).
- Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 Datasets. CoRR abs/1306.2597 (2013). http://arxiv.org/abs/1306.2597
- Toward transparent ai: A survey on interpreting the inner structures of deep neural networks. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE, 464–483.
- Probing the probing paradigm: Does probing accuracy entail task relevance? arXiv preprint arXiv:2005.00719 (2020).
- Neuron-level interpretation of deep nlp models: A survey. Transactions of the Association for Computational Linguistics 10 (2022), 1285–1303.
- Mukund Sundararajan and Amir Najmi. 2020. The many Shapley values for model explanation. In International conference on machine learning. PMLR, 9269–9278.
- Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663 (2021).
- Supermasks in superposition. Advances in Neural Information Processing Systems 33 (2020), 15173–15184.
- An analysis of BERT in document ranking. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 1941–1944.
- Object detectors emerge in deep scene cnns. arXiv preprint arXiv:1412.6856 (2014).
- Revisiting the importance of individual units in cnns via ablation. arXiv preprint arXiv:1806.02891 (2018).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.