Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking (2310.13243v1)

Published 20 Oct 2023 in cs.IR and cs.CL

Abstract: In the field of information retrieval, Query Likelihood Models (QLMs) rank documents based on the probability of generating the query given the content of a document. Recently, advanced LLMs have emerged as effective QLMs, showcasing promising ranking capabilities. This paper focuses on investigating the genuine zero-shot ranking effectiveness of recent LLMs, which are solely pre-trained on unstructured text data without supervised instruction fine-tuning. Our findings reveal the robust zero-shot ranking ability of such LLMs, highlighting that additional instruction fine-tuning may hinder effectiveness unless a question generation task is present in the fine-tuning dataset. Furthermore, we introduce a novel state-of-the-art ranking system that integrates LLM-based QLMs with a hybrid zero-shot retriever, demonstrating exceptional effectiveness in both zero-shot and few-shot scenarios. We make our codebase publicly available at https://github.com/ielab/LLM-qlm.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Falcon-40B: an open large language model with state-of-the-art performance.
  2. Elias Bassani and Luca Romelli. 2022. ranx.fuse: A python library for metasearch. In CIKM, pages 4808–4812. ACM.
  3. Inpars: Unsupervised dataset generation for information retrieval. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, page 2387–2392, New York, NY, USA. Association for Computing Machinery.
  4. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA. Curran Associates Inc.
  5. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  6. Promptagator: Few-shot dense retrieval from 8 examples. In The Eleventh International Conference on Learning Representations.
  7. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  8. From distillation to hard negative sampling: Making sparse neural ir models more effective. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, page 2353–2359, New York, NY, USA. Association for Computing Machinery.
  9. Splade: Sparse lexical and expansion model for first stage ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, page 2288–2292, New York, NY, USA. Association for Computing Machinery.
  10. Luyu Gao and Jamie Callan. 2022. Unsupervised corpus aware language model pre-training for dense passage retrieval. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2843–2853, Dublin, Ireland. Association for Computational Linguistics.
  11. Precise zero-shot dense retrieval without relevance labels. arXiv preprint arXiv:2212.10496.
  12. Dbpedia-entity v2: A test collection for entity search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’17, page 1265–1268, New York, NY, USA. Association for Computing Machinery.
  13. Djoerd Hiemstra. 2000. Using language models for information retrieval. Ph.D. thesis, University of Twente.
  14. Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research.
  15. Inpars-v2: Large language models as efficient dataset generators for information retrieval. arXiv preprint arXiv:2301.01820.
  16. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
  17. A modern perspective on query likelihood with deep generative retrieval models. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR ’21, page 185–195, New York, NY, USA. Association for Computing Machinery.
  18. Jimmy Lin and Xueguang Ma. 2021. A few brief notes on deepimpact, coil, and a conceptual framework for information retrieval techniques. arXiv preprint arXiv:2106.14807.
  19. How to train your dragon: Diverse augmentation towards generalizable dense retrieval. arXiv preprint arXiv:2302.07452.
  20. Document expansion baselines and learned sparse lexical representations for ms marco v1 and v2. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, page 3187–3197, New York, NY, USA. Association for Computing Machinery.
  21. Www’18 open challenge: Financial opinion mining and question answering. In Companion Proceedings of the The Web Conference 2018, WWW ’18, page 1941–1942, Republic and Canton of Geneva, CHE. International World Wide Web Conferences Steering Committee.
  22. MS MARCO: A human-generated MAchine reading COmprehension dataset.
  23. Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with bert. arXiv preprint arXiv:1901.04085.
  24. Document ranking with a pretrained sequence-to-sequence model. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 708–718, Online. Association for Computational Linguistics.
  25. Rodrigo Nogueira and Jimmy Lin. 2019. From doc2query to docTTTTTquery.
  26. Beyond [CLS] through ranking by generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1722–1727, Online. Association for Computational Linguistics.
  27. The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116.
  28. Jay M. Ponte and W. Bruce Croft. 1998. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’98, page 275–281, New York, NY, USA. Association for Computing Machinery.
  29. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1).
  30. Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3(4):333–389.
  31. Improving passage retrieval with zero-shot question generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3781–3797, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  32. Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations.
  33. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  34. BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  35. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  36. Trec-covid: Constructing a pandemic information retrieval test collection. SIGIR Forum, 54(1).
  37. Ellen M. Voorhees. 2005. The trec robust retrieval track. SIGIR Forum, 39(1):11–20.
  38. Simlm: Pre-training with representation bottleneck for dense passage retrieval. arXiv preprint arXiv:2207.02578.
  39. Bert-based dense retrievers require interpolation with bm25 for effective passage retrieval. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR ’21, page 317–324, New York, NY, USA. Association for Computing Machinery.
  40. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
  41. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  42. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In International Conference on Learning Representations.
  43. Pretrained transformers for text ranking: BERT and beyond. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorials, pages 1–4, Online. Association for Computational Linguistics.
  44. Chengxiang Zhai and John Lafferty. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’01, page 334–342, New York, NY, USA. Association for Computing Machinery.
  45. Chengxiang Zhai and John Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179–214.
  46. Deep query likelihood model for information retrieval. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 – April 1, 2021, Proceedings, Part II, page 463–470, Berlin, Heidelberg. Springer-Verlag.
  47. Shengyao Zhuang and Guido Zuccon. 2021a. Dealing with typos for BERT-based passage retrieval and ranking. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2836–2842, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  48. Shengyao Zhuang and Guido Zuccon. 2021b. Fast passage re-ranking with contextualized exact term matching and efficient passage expansion. arXiv preprint arXiv:2108.08513.
  49. Shengyao Zhuang and Guido Zuccon. 2021c. Tilde: Term independent likelihood model for passage re-ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, page 1483–1492, New York, NY, USA. Association for Computing Machinery.
  50. Shengyao Zhuang and Guido Zuccon. 2022. Characterbert and self-teaching for improving the robustness of dense retrievers on queries with typos. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, page 1444–1454, New York, NY, USA. Association for Computing Machinery.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shengyao Zhuang (42 papers)
  2. Bing Liu (211 papers)
  3. Bevan Koopman (37 papers)
  4. Guido Zuccon (73 papers)
Citations (34)
X Twitter Logo Streamline Icon: https://streamlinehq.com