Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models (2312.02969v1)
Abstract: Listwise rerankers based on LLMs (LLM) are the zero-shot state-of-the-art. However, current works in this direction all depend on the GPT models, making it a single point of failure in scientific reproducibility. Moreover, it raises the concern that the current research findings only hold for GPT models but not LLM in general. In this work, we lift this pre-condition and build for the first time effective listwise rerankers without any form of dependency on GPT. Our passage retrieval experiments show that our best list se reranker surpasses the listwise rerankers based on GPT-3.5 by 13% and achieves 97% effectiveness of the ones built on GPT-4. Our results also show that the existing training datasets, which were expressly constructed for pointwise ranking, are insufficient for building such listwise rerankers. Instead, high-quality listwise ranking data is required and crucial, calling for further work on building human-annotated listwise data resources.
- MS MARCO: A human generated machine reading comprehension dataset. arXiv:1611.09268.
- InPars: Unsupervised dataset generation for information retrieval. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2387–2392.
- Learning to rank: From pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning, ICML ’07, page 129–136, New York, NY, USA. Association for Computing Machinery.
- Vicuna: An open-source chatbot impressing GPT-4 with 90%* ChatGPT quality.
- Overview of the TREC 2020 deep learning track. arXiv:2102.07662.
- Overview of the TREC 2019 deep learning track. arXiv:2003.07820.
- Promptagator: Few-shot dense retrieval from 8 examples. arXiv:2209.11755.
- Tri Dao. 2023. FlashAttention-2: Faster attention with better parallelism and work partitioning. arXiv:2307.08691.
- FlashAttention: Fast and memory-efficient exact attention with IO-awareness. In Advances in Neural Information Processing Systems.
- QLoRA: Efficient finetuning of quantized LLMs. arXiv:2305.14314.
- Rethink training of bert rerankers in multi-stage retrieval pipeline. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 – April 1, 2021, Proceedings, Part II, page 280–286, Berlin, Heidelberg. Springer-Verlag.
- Unsupervised dense information retrieval with contrastive learning. arXiv:2112.09118.
- InPars-v2: Large language models as efficient dataset generators for information retrieval. arXiv:2301.01820.
- Lost in the middle: How language models use long contexts. arXiv:2307.03172.
- Fine-tuning LLaMA for multi-stage text retrieval. arXiv:2309.15088.
- Zero-shot listwise document reranking with a large language model. arXiv:2305.02156.
- Niklas Muennighoff. 2022. SGPT: GPT sentence embeddings for semantic search. arXiv:2202.08904.
- Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with BERT. arXiv:1901.04085.
- Document ranking with a pretrained sequence-to-sequence model. arXiv:2003.06713.
- Squeezing water from a stone: A bag of tricks for further improving cross-encoder effectiveness for reranking. In Advances in Information Retrieval, pages 655–670, Cham. Springer International Publishing.
- The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models. arXiv:2101.05667.
- RankVicuna: Zero-shot listwise document reranking with open-source large language models. arXiv:2309.15088.
- Large language models are effective text rankers with pairwise ranking prompting. arXiv:2306.17563.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1).
- The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval.
- Code Llama: Open foundation models for code. arXiv:2308.12950.
- Improving passage retrieval with zero-shot question generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3781–3797, Abu Dhabi, United Arab Emirates.
- Is ChatGPT good at search? Investigating large language models as re-ranking agent. arXiv:2304.09542.
- Found in the middle: Permutation self-consistency improves listwise ranking in large language models. arXiv:2310.07712.
- BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv:2306.05685.
- RankT5: Fine-tuning T5 for text ranking with ranking losses. In Proc. of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
- Xinyu Zhang (296 papers)
- Sebastian Hofstätter (31 papers)
- Patrick Lewis (37 papers)
- Raphael Tang (32 papers)
- Jimmy Lin (208 papers)