Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models (2405.02503v1)

Published 3 May 2024 in cs.IR

Abstract: Neural models have demonstrated remarkable performance across diverse ranking tasks. However, the processes and internal mechanisms along which they determine relevance are still largely unknown. Existing approaches for analyzing neural ranker behavior with respect to IR properties rely either on assessing overall model behavior or employing probing methods that may offer an incomplete understanding of causal mechanisms. To provide a more granular understanding of internal model decision-making processes, we propose the use of causal interventions to reverse engineer neural rankers, and demonstrate how mechanistic interpretability methods can be used to isolate components satisfying term-frequency axioms within a ranking model. We identify a group of attention heads that detect duplicate tokens in earlier layers of the model, then communicate with downstream heads to compute overall document relevance. More generally, we propose that this style of mechanistic analysis opens up avenues for reverse engineering the processes neural retrieval models use to compute relevance. This work aims to initiate granular interpretability efforts that will not only benefit retrieval model development and training, but ultimately ensure safer deployment of these models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Yonatan Belinkov. 2022. Probing classifiers: Promises, shortcomings, and advances. Computational Linguistics 48, 1 (2022), 207–219.
  2. Yonatan Belinkov and James Glass. 2019. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics 7 (2019), 49–72.
  3. Peter D Bruza and Theo WC Huibers. 1994. Investigating aboutness axioms using information fields. In SIGIR’94: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, organised by Dublin City University. Springer, 112–121.
  4. Arthur Câmara and Claudia Hauff. 2020. Diagnosing BERT with retrieval heuristics. In Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I 42. Springer, 605–618.
  5. Axiomatically Regularized Pre-training for Ad hoc Search. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1524–1534.
  6. Zitong Cheng and Hui Fang. 2020. Utilizing Axiomatic Perturbations to Guide Neural Ranking Models. In Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval. 153–156.
  7. Finding Inverse Document Frequency Information in BERT. arXiv preprint arXiv:2202.12191 (2022).
  8. What does bert look at? an analysis of bert’s attention. arXiv preprint arXiv:1906.04341 (2019).
  9. A Mathematical Framework for Transformer Circuits. Transformer Circuits Thread (2021). https://transformer-circuits.pub/2021/framework/index.html.
  10. A linguistic study on relevance modeling in information retrieval. In Proceedings of the Web Conference 2021. 1053–1064.
  11. A formal study of information retrieval heuristics. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. 49–56.
  12. A white box analysis of ColBERT. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28–April 1, 2021, Proceedings, Part II 43. Springer, 257–263.
  13. Match your words! a study of lexical matching in neural information retrieval. In European Conference on Information Retrieval. Springer, 120–127.
  14. Causal abstractions of neural networks. Advances in Neural Information Processing Systems 34 (2021), 9574–9586.
  15. Dissecting recall of factual associations in auto-regressive language models. arXiv preprint arXiv:2304.14767 (2023).
  16. Localizing model behavior with path patching. arXiv preprint arXiv:2304.05969 (2023).
  17. Axiomatic result re-ranking. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 721–730.
  18. How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model. arXiv preprint arXiv:2305.00586 (2023).
  19. Hunter Scott Heidenreich and Jake Ryland Williams. 2021. The earth is flat and the sun is not a star: The susceptibility of gpt-2 to universal adversarial triggers. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 566–573.
  20. Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. In Proc. of SIGIR.
  21. Badedit: Backdooring large language models by model editing. arXiv preprint arXiv:2403.13355 (2024).
  22. ABNIRML: analyzing the behavior of neural IR models. Transactions of the Association for Computational Linguistics 10 (2022), 224–239.
  23. Locating and editing factual associations in GPT. Advances in Neural Information Processing Systems 35 (2022), 17359–17372.
  24. Circuit Component Reuse Across Tasks in Transformer Language Models. arXiv preprint arXiv:2310.08744 (2023).
  25. Neel Nanda and Joseph Bloom. 2022. TransformerLens. https://github.com/neelnanda-io/TransformerLens.
  26. MS MARCO: A human generated machine reading comprehension dataset. choice 2640 (2016), 660.
  27. Judea Pearl. 2022. Direct and indirect effects. In Probabilistic and causal inference: the works of Judea Pearl. 373–392.
  28. An axiomatic approach to diagnosing neural IR models. In Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part I 41. Springer, 489–503.
  29. An axiomatic approach to regularizing neural ranking models. In Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. 981–984.
  30. The curious case of IR explainability: Explaining document scores within and across ranking models. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2069–2072.
  31. Attention is all you need. Advances in neural information processing systems 30 (2017).
  32. Investigating gender bias in language models using causal mediation analysis. Advances in neural information processing systems 33 (2020), 12388–12401.
  33. Towards axiomatic explanations for neural ranking models. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval. 13–22.
  34. Universal adversarial triggers for attacking and analyzing NLP. arXiv preprint arXiv:1908.07125 (2019).
  35. Probing BERT for ranking abilities. In European Conference on Information Retrieval. Springer, 255–273.
  36. Interpretability in the wild: a circuit for indirect object identification in gpt-2 small. arXiv preprint arXiv:2211.00593 (2022).
  37. An analysis of BERT in document ranking. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 1941–1944.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Catherine Chen (11 papers)
  2. Jack Merullo (15 papers)
  3. Carsten Eickhoff (75 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.