Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fine-Tuning LLaMA for Multi-Stage Text Retrieval (2310.08319v1)

Published 12 Oct 2023 in cs.IR

Abstract: The effectiveness of multi-stage text retrieval has been solidly demonstrated since before the era of pre-trained LLMs. However, most existing studies utilize models that predate recent advances in LLMs. This study seeks to explore potential improvements that state-of-the-art LLMs can bring. We conduct a comprehensive study, fine-tuning the latest LLaMA model both as a dense retriever (RepLLaMA) and as a pointwise reranker (RankLLaMA) for both passage retrieval and document retrieval using the MS MARCO datasets. Our findings demonstrate that the effectiveness of LLMs indeed surpasses that of smaller models. Additionally, since LLMs can inherently handle longer contexts, they can represent entire documents holistically, obviating the need for traditional segmenting and pooling strategies. Furthermore, evaluations on BEIR demonstrate that our RepLLaMA-RankLLaMA pipeline exhibits strong zero-shot effectiveness. Model checkpoints from this study are available on HuggingFace.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. MS MARCO: A human generated machine reading comprehension dataset. arXiv:1611.09268.
  2. Early exit optimizations for additive machine learned ranking systems. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM ’10, page 411–420, New York, NY, USA. Association for Computing Machinery.
  3. Reading Wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1870–1879, Vancouver, Canada. Association for Computational Linguistics.
  4. Evaluating large language models trained on code. arXiv:2107.03374.
  5. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
  6. Overview of the TREC 2020 deep learning track. arXiv:2102.07662.
  7. Overview of the TREC 2019 deep learning track. arXiv:2003.07820.
  8. Zhuyun Dai and Jamie Callan. 2019. Deeper text understanding for IR with contextual neural language modeling. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’19, page 985–988, New York, NY, USA. Association for Computing Machinery.
  9. Tri Dao. 2023. FlashAttention-2: Faster attention with better parallelism and work partitioning. arXiv:2307.08691.
  10. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  11. Luyu Gao and Jamie Callan. 2022a. Long document re-ranking with modular re-ranker. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, page 2371–2376, New York, NY, USA. Association for Computing Machinery.
  12. Luyu Gao and Jamie Callan. 2022b. Unsupervised corpus aware language model pre-training for dense passage retrieval. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2843–2853, Dublin, Ireland. Association for Computational Linguistics.
  13. Rethink training of BERT rerankers in multi-stage retrieval pipeline. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 – April 1, 2021, Proceedings, Part II, page 280–286, Berlin, Heidelberg. Springer-Verlag.
  14. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  15. Active retrieval augmented generation. arXiv:2305.06983.
  16. Evaluating embedding APIs for information retrieval. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 518–526, Toronto, Canada. Association for Computational Linguistics.
  17. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
  18. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
  19. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, Inc.
  20. Jimmy Lin. 2021. A proposed conceptual framework for a representational approach to information retrieval. arXiv:2110.01529.
  21. Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, page 2356–2362, New York, NY, USA. Association for Computing Machinery.
  22. Vector search with OpenAI embeddings: Lucene is all you need. arXiv:2308.14963.
  23. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692.
  24. Zero-shot listwise document reranking with a large language model. arXiv:2305.02156.
  25. Yu A. Malkov and D. A. Yashunin. 2020. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824–836.
  26. High accuracy retrieval with multiple nested ranker. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’06, page 437–444, New York, NY, USA. Association for Computing Machinery.
  27. Niklas Muennighoff. 2022. SGPT: GPT sentence embeddings for semantic search. arXiv:2202.08904.
  28. Text and code embeddings by contrastive pre-training. arXiv:2201.10005.
  29. Large dual encoders are generalizable retrievers. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9844–9855, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  30. Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with BERT. arXiv:1901.04085.
  31. Document ranking with a pretrained sequence-to-sequence model. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 708–718, Online. Association for Computational Linguistics.
  32. Multi-stage document ranking with BERT. arXiv:1910.14424.
  33. OpenAI. 2023. GPT-4 technical report. arXiv:2303.08774.
  34. Training language models to follow instructions with human feedback. arXiv:2203.02155.
  35. KILT: a benchmark for knowledge intensive language tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2523–2544, Online. Association for Computational Linguistics.
  36. Squeezing water from a stone: A bag of tricks for further improving cross-encoder effectiveness for reranking. In Advances in Information Retrieval, pages 655–670, Cham. Springer International Publishing.
  37. The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models. arXiv:2101.05667.
  38. RankVicuna: Zero-shot listwise document reranking with open-source large language models. arXiv:2309.15088.
  39. Large language models are effective text rankers with pairwise ranking prompting. arXiv:2306.17563.
  40. RocketQA: An optimized training approach to dense passage retrieval for open-domain question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5835–5847, Online. Association for Computational Linguistics.
  41. Improving language understanding by generative pre-training.
  42. Language models are unsupervised multitask learners.
  43. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  44. REPLUG: Retrieval-augmented black-box language models. arXiv:2301.12652.
  45. Is ChatGPT good at search? Investigating large language models as re-ranking agent. arXiv:2304.09542.
  46. BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  47. FEVER: a large-scale dataset for fact extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819, New Orleans, Louisiana. Association for Computational Linguistics.
  48. LLaMA: Open and efficient foundation language models. arXiv:2302.13971.
  49. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288.
  50. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  51. SimLM: Pre-training with representation bottleneck for dense passage retrieval. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2244–2258, Toronto, Canada. Association for Computational Linguistics.
  52. A cascade ranking model for efficient ranked retrieval. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’11, page 105–114, New York, NY, USA. Association for Computing Machinery.
  53. Chain of thought prompting elicits reasoning in large language models. arXiv:2201.11903.
  54. RetroMAE: Pre-training retrieval-oriented language models via masked auto-encoder. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 538–548, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  55. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
  56. Inference with reference: Lossless acceleration of large language models. arXiv:2304.04487.
  57. A survey of large language models. arXiv:2303.18223.
  58. RankT5: Fine-tuning T5 for text ranking with ranking losses. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, page 2308–2313, New York, NY, USA. Association for Computing Machinery.
Citations (116)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets