Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
131 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM-Oriented Retrieval Tuner (2403.01999v1)

Published 4 Mar 2024 in cs.CL

Abstract: Dense Retrieval (DR) is now considered as a promising tool to enhance the memorization capacity of LLMs (LLM) such as GPT3 and GPT-4 by incorporating external memories. However, due to the paradigm discrepancy between text generation of LLM and DR, it is still an open challenge to integrate the retrieval and generation tasks in a shared LLM. In this paper, we propose an efficient LLM-Oriented Retrieval Tuner, namely LMORT, which decouples DR capacity from base LLM and non-invasively coordinates the optimally aligned and uniform layers of the LLM towards a unified DR space, achieving an efficient and effective DR without tuning the LLM itself. The extensive experiments on six BEIR datasets show that our approach could achieve competitive zero-shot retrieval performance compared to a range of strong DR models while maintaining the generation ability of LLM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. A full-text learning to rank dataset for medical information retrieval. In Advances in Information Retrieval - 38th European Conference on IR Research ECIR, pp.  716–722, 2016. URL https://doi.org/10.1007/978-3-319-30671-1_58.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
  3. SPECTER: Document-level representation learning using citation-informed transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  2270–2282, Online, July 2020. Association for Computational Linguistics. URL https://aclanthology.org/2020.acl-main.207.
  4. Unsupervised corpus aware language model pre-training for dense passage retrieval. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pp.  2843–2853, 2022. URL https://doi.org/10.18653/v1/2022.acl-long.203.
  5. Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research, 2022. URL https://openreview.net/forum?id=jKN1pXi7b0.
  6. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020. URL https://aclanthology.org/2020.emnlp-main.550.
  7. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, 2019. URL https://doi.org/10.18653/v1/n19-1423.
  8. Large language models are zero-shot reasoners. In Advances in neural information processing systems, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/8bb0d291acd4acf06ef112099c16f326-Abstract-Conference.html.
  9. Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019. URL https://www.aclweb.org/anthology/P19-1612.
  10. Less is more: Pretrain a strong siamese encoder for dense text retrieval using a weak decoder. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, pp.  2780–2791, 2021. URL https://doi.org/10.18653/v1/2021.emnlp-main.220.
  11. Zero-shot neural passage retrieval via domain-targeted synthetic question generation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 1075–1088, 2021. URL https://doi.org/10.18653/v1/2021.eacl-main.92.
  12. Www’18 open challenge: Financial opinion mining and question answering. In Companion Proceedings of the The Web Conference 2018, 2018. URL https://doi.org/10.1145/3184558.3192301.
  13. Niklas Muennighoff. SGPT: GPT sentence embeddings for semantic search. CoRR, abs/2202.08904, 2022. URL https://arxiv.org/abs/2202.08904.
  14. Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005, 2022. URL https://arxiv.org/abs/2201.10005.
  15. On memory in human and artificial language processing systems. In Proceedings of ICLR Workshop on Bridging AI and Cognitive Science, 2020. URL https://baicsworkshop.github.io/pdf/BAICS_22.pdf.
  16. MS MARCO: A human generated machine reading comprehension dataset. In Proceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, December 9, 2016, volume 1773 of CEUR Workshop Proceedings. CEUR-WS.org, 2016. URL https://ceur-ws.org/Vol-1773/CoCoNIPS_2016_paper9.pdf.
  17. Large dual encoders are generalizable retrievers. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP, 2022. URL https://doi.org/10.18653/v1/2022.emnlp-main.669.
  18. OpenAI. Gpt-4 technical report. 2023. URL https://arxiv.org/abs/2303.08774.
  19. Language models are unsupervised multitask learners. 2018. URL https://insightcivic.s3.us-east-1.amazonaws.com/language-models.pdf.
  20. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67, 2020. URL http://jmlr.org/papers/v21/20-074.html.
  21. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  3982–3992, 2019. URL https://aclanthology.org/D19-1410.
  22. Zero-offload: Democratizing billion-scale model training. In 2021 USENIX Annual Technical Conference, USENIX ATC 2021, July 14-16, 2021, pp.  551–564. USENIX Association, 2021a. URL https://www.usenix.org/conference/atc21/presentation/ren-jie.
  23. Rocketqav2: A joint training method for dense passage retrieval and passage re-ranking. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp.  2825–2835, 2021b. URL https://doi.org/10.18653/v1/2021.emnlp-main.224.
  24. Analyzing encoded concepts in transformer language models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pp.  3082–3101, 2022. URL https://doi.org/10.18653/v1/2022.naacl-main.225.
  25. Reduce catastrophic forgetting of dense retrieval training with teleportation negatives. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP, 2022. URL https://doi.org/10.18653/v1/2022.emnlp-main.445.
  26. Beir: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks, 2021. URL https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/65b9eea6e1cc6bb9f0cd2a47751a186f-Abstract-round2.html.
  27. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023. URL https://arxiv.org/abs/2302.13971.
  28. Trec-covid: Constructing a pandemic information retrieval test collection. SIGIR Forum, 54(1), feb 2021. ISSN 0163-5840. URL https://doi.org/10.1145/3451964.3451965.
  29. Retrieval of the best counterargument without prior topic knowledge. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  241–251, Melbourne, Australia, July 2018. Association for Computational Linguistics. URL https://aclanthology.org/P18-1023.
  30. Fact or fiction: Verifying scientific claims. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  7534–7550. Association for Computational Linguistics, November 2020. URL https://aclanthology.org/2020.emnlp-main.609.
  31. Ben Wang. Mesh-Transformer-JAX: Model-Parallel Implementation of Transformer Language Model with JAX, 2021. URL https://github.com/kingoflolz/mesh-transformer-jax.
  32. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, pp.  9929–9939. PMLR, 2020. URL http://proceedings.mlr.press/v119/wang20k.html.
  33. Lilian Weng. Llm powered autonomous agents. 2023. URL https://lilianweng.github.io/posts/2023-06-23-agent.
  34. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In Proceedings of International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=zeFrfgyZln.
  35. Anserini: Enabling the use of lucene for information retrieval research. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’17, pp. 1253–1256, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450350228. URL https://doi.org/10.1145/3077136.3080721.
  36. Coco-dr: Combating distribution shifts in zero-shot dense retrieval with contrastive and distributionally robust learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022. URL https://aclanthology.org/2022.emnlp-main.95.
  37. Adversarial retriever-ranker for dense text retrieval. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022, 2022. URL https://openreview.net/forum?id=MR7XubKUFB.

Summary

We haven't generated a summary for this paper yet.