Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models (2310.09497v2)

Published 14 Oct 2023 in cs.IR and cs.AI

Abstract: We propose a novel zero-shot document ranking approach based on LLMs: the Setwise prompting approach. Our approach complements existing prompting approaches for LLM-based zero-shot ranking: Pointwise, Pairwise, and Listwise. Through the first-of-its-kind comparative evaluation within a consistent experimental framework and considering factors like model size, token consumption, latency, among others, we show that existing approaches are inherently characterised by trade-offs between effectiveness and efficiency. We find that while Pointwise approaches score high on efficiency, they suffer from poor effectiveness. Conversely, Pairwise approaches demonstrate superior effectiveness but incur high computational overhead. Our Setwise approach, instead, reduces the number of LLM inferences and the amount of prompt token consumption during the ranking procedure, compared to previous methods. This significantly improves the efficiency of LLM-based zero-shot ranking, while also retaining high zero-shot ranking effectiveness. We make our code and results publicly available at \url{https://github.com/ielab/LLM-rankers}.

A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with LLMs

LLMs have recently demonstrated substantial efficacy in zero-shot document ranking tasks. However, their application has been hindered by computational inefficiencies, particularly when using Pointwise, Pairwise, and Listwise prompting approaches. This paper evaluates these traditional methods and introduces a novel Setwise prompting approach to balance effectiveness and efficiency in LLM-based zero-shot ranking tasks.

Evaluation of Traditional Approaches

The paper begins by analyzing existing Pointwise, Pairwise, and Listwise prompting methods. The primary objective is to identify trade-offs between effectiveness and computational efficiency. According to the results, Pointwise approaches present high efficiency but low effectiveness, which can be attributed to their reliance on individual document evaluations. In contrast, Pairwise approaches provide better effectiveness through document comparisons, yet incur significant computational overhead due to a high number of LLM inferences required. Listwise methods, which generate document rankings in order, offer varying efficiency and effectiveness contingent on specific configurations and evaluation settings.

Introduction of the Setwise Approach

To address the shortcomings found in the traditional methods, the authors propose a Setwise prompting approach designed to enhance the efficiency of LLM-based zero-shot ranking. This methodology reduces the number of LLM inferences and the token consumption. By increasing the number of documents evaluated concurrently, the Setwise approach leverages sorting algorithms like Heap sort and Bubble sort to achieve efficiency gains. The authors position Setwise as a middle ground, combining the desirable characteristics of Pointwise, Pairwise, and Listwise approaches.

Empirical Evaluation and Results

Comprehensive empirical evaluations are conducted using the TREC Deep Learning datasets and the BEIR benchmark, employing LLMs of various sizes, including the Flan-T5 models. The results underscore that Setwise prompting significantly reduces computational costs while maintaining high levels of ranking effectiveness. For instance, Setwise approaches manage to decrease the average query latency compared to traditional methods without compromising accuracy.

The experiments also reveal interesting sensitivity characteristics. Among the noteworthy findings is the robustness of Setwise to the initial ranking order, surpassing the performance consistency of existing approaches when the initial candidate document ordering differs. Additionally, the utilization of LLM logits for likelihood estimation in Setwise further boosts efficiency while retaining effectiveness.

Practical and Theoretical Implications

The introduction of Setwise prompting holds promising implications both in practical applications and theoretical explorations. Practically, the reduction in computational load and cost implies that real-world applications, such as large-scale search engines and information retrieval systems, can integrate LLMs more efficiently. Theoretically, the paper suggests potential opportunities to refine LLM capabilities in tackling zero-shot scenarios, encouraging further research into auxiliary techniques like prompt learning for enhanced performance.

Moreover, the robust performance of Setwise across various initial conditions indicates a broader applicability, suggesting that future work could expand on improving ranking tasks across different retrieval pipelines and even explore extending these concepts to other natural language processing tasks beyond document ranking.

In conclusion, the paper showcases how a methodical redesign of prompting strategies—embodied by Setwise prompting—can lead to measurable advancements in zero-shot document ranking tasks, paving the way for more scalable and efficient use of LLMs in information retrieval applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Large language models are zero-shot clinical information extractors. arXiv preprint arXiv:2205.12689 (2022).
  2. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  3. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
  4. Overview of the TREC 2020 deep learning track. arXiv preprint arXiv:2102.07662 (2021).
  5. Overview of the TREC 2019 deep learning track. arXiv preprint arXiv:2003.07820 (2020).
  6. Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution. arXiv preprint arXiv:2309.16797 (2023).
  7. Sparse Pairwise Re-Ranking with Pre-Trained Transformers. In Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval (Madrid, Spain) (ICTIR ’22). Association for Computing Machinery, New York, NY, USA, 72–80. https://doi.org/10.1145/3539813.3545140
  8. Donald Ervin Knuth. 1997. The art of computer programming. Vol. 3. Pearson Education.
  9. Large language models are zero-shot reasoners. Advances in neural information processing systems 35 (2022), 22199–22213.
  10. Holistic evaluation of language models. arXiv preprint arXiv:2211.09110 (2022).
  11. Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, Canada) (SIGIR ’21). Association for Computing Machinery, New York, NY, USA, 2356–2362. https://doi.org/10.1145/3404835.3463238
  12. Zero-Shot Listwise Document Reranking with a Large Language Model. arXiv preprint arXiv:2305.02156 (2023).
  13. Active Sampling for Pairwise Comparisons via Approximate Message Passing and Information Gain Maximization. In 2020 IEEE International Conference on Pattern Recognition (ICPR).
  14. Document Ranking with a Pretrained Sequence-to-Sequence Model. In Findings of the Association for Computational Linguistics: EMNLP 2020. 708–718.
  15. Jay M Ponte and W Bruce Croft. 2017. A language modeling approach to information retrieval. In ACM SIGIR Forum, Vol. 51. ACM New York, NY, USA, 202–208.
  16. The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models. arXiv preprint arXiv:2101.05667 (2021).
  17. RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models. arXiv preprint arXiv:2309.15088 (2023).
  18. Large language models are effective text rankers with pairwise ranking prompting. arXiv preprint arXiv:2306.17563 (2023).
  19. Improving Passage Retrieval with Zero-Shot Question Generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 3781–3797. https://doi.org/10.18653/v1/2022.emnlp-main.249
  20. Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent. arXiv preprint arXiv:2304.09542 (2023).
  21. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  22. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  23. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  24. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.
  25. Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 1426–1436. https://doi.org/10.1145/3539618.3591703
  26. Finetuned Language Models are Zero-Shot Learners. In International Conference on Learning Representations.
  27. Large language models as optimizers. arXiv preprint arXiv:2309.03409 (2023).
  28. Deep query likelihood model for information retrieval. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28–April 1, 2021, Proceedings, Part II 43. Springer, 463–470.
  29. Shengyao Zhuang and Guido Zuccon. 2021. TILDE: Term independent likelihood moDEl for passage re-ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1483–1492.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shengyao Zhuang (42 papers)
  2. Honglei Zhuang (31 papers)
  3. Bevan Koopman (37 papers)
  4. Guido Zuccon (73 papers)
Citations (12)
Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com