Large Search Model: Redefining Search Stack in the Era of LLMs (2310.14587v2)
Abstract: Modern search engines are built on a stack of different components, including query understanding, retrieval, multi-stage ranking, and question answering, among others. These components are often optimized and deployed independently. In this paper, we introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one LLM. All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts. This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack. To substantiate the feasibility of this framework, we present a series of proof-of-concept experiments and discuss the potential challenges associated with implementing this approach within real-world search systems.
- Gqa: Training generalized multi-query transformer models from multi-head checkpoints. ArXiv preprint, abs/2305.13245, 2023. URL https://arxiv.org/abs/2305.13245.
- Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
- Improving language models by retrieving from trillions of tokens. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 2206–2240. PMLR, 2022. URL https://proceedings.mlr.press/v162/borgeaud22a.html.
- Extending context window of large language models via positional interpolation. ArXiv preprint, abs/2306.15595, 2023. URL https://arxiv.org/abs/2306.15595.
- Promptagator: Few-shot dense retrieval from 8 examples. In The Eleventh International Conference on Learning Representations, 2022.
- Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 35:16344–16359, 2022.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, 2019. Association for Computational Linguistics. 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
- Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. The Journal of Machine Learning Research, 23(1):5232–5270, 2022.
- Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. ArXiv preprint, abs/2209.07858, 2022. URL https://arxiv.org/abs/2209.07858.
- COIL: Revisit exact lexical match in information retrieval with contextualized inverted list. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3030–3042, Online, 2021. Association for Computational Linguistics. 10.18653/v1/2021.naacl-main.241. URL https://aclanthology.org/2021.naacl-main.241.
- Precise zero-shot dense retrieval without relevance labels. ArXiv preprint, abs/2212.10496, 2022. URL https://arxiv.org/abs/2212.10496.
- Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. ArXiv preprint, abs/1510.00149, 2015. URL https://arxiv.org/abs/1510.00149.
- Language is not all you need: Aligning perception with language models. ArXiv preprint, abs/2302.14045, 2023. URL https://arxiv.org/abs/2302.14045.
- Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 874–880, Online, 2021. Association for Computational Linguistics. 10.18653/v1/2021.eacl-main.74. URL https://aclanthology.org/2021.eacl-main.74.
- Scaling laws for neural language models. ArXiv preprint, abs/2001.08361, 2020. URL https://arxiv.org/abs/2001.08361.
- Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online, 2020. Association for Computational Linguistics. 10.18653/v1/2020.emnlp-main.550. URL https://aclanthology.org/2020.emnlp-main.550.
- Alignment of language agents. ArXiv preprint, abs/2103.14659, 2021. URL https://arxiv.org/abs/2103.14659.
- Generalization through memorization: Nearest neighbor language models. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=HklBjCEKvH.
- Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466, 2019. 10.1162/tacl_a_00276. URL https://aclanthology.org/Q19-1026.
- Fast inference from transformers via speculative decoding. In International Conference on Machine Learning, pages 19274–19286. PMLR, 2023.
- Retrieval-augmented generation for knowledge-intensive NLP tasks. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html.
- Jimmy J. Lin. A proposed conceptual framework for a representational approach to information retrieval. ACM SIGIR Forum, 55:1 – 29, 2021.
- Lost in the middle: How language models use long contexts. ArXiv preprint, abs/2307.03172, 2023a. URL https://arxiv.org/abs/2307.03172.
- Webglm: Towards an efficient web-enhanced question answering system with human preferences. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023b.
- Rethinking search: making domain experts out of dilettantes. In Acm sigir forum, volume 55, pages 1–27. ACM New York, NY, USA, 2021.
- An introduction to neural information retrieval. Foundations and Trends® in Information Retrieval, 13(1):1–126, 2018.
- Webgpt: Browser-assisted question-answering with human feedback. ArXiv preprint, abs/2112.09332, 2021. URL https://arxiv.org/abs/2112.09332.
- Ms marco: A human-generated machine reading comprehension dataset. 2016.
- Multi-stage document ranking with bert. ArXiv preprint, abs/1910.14424, 2019a. URL https://arxiv.org/abs/1910.14424.
- Document expansion by query prediction. ArXiv preprint, abs/1904.08375, 2019b. URL https://arxiv.org/abs/1904.08375.
- OpenAI. Gpt-4 technical report. ArXiv preprint, abs/2303.08774, 2023. URL https://arxiv.org/abs/2303.08774.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- The dilemma of the direct answer. ACM SIGIR Forum, 54:1 – 12, 2021.
- How does generative retrieval scale to millions of passages? ArXiv preprint, abs/2305.11841, 2023. URL https://arxiv.org/abs/2305.11841.
- Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763. PMLR, 2021. URL http://proceedings.mlr.press/v139/radford21a.html.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67, 2020. URL http://jmlr.org/papers/v21/20-074.html.
- In-context retrieval-augmented language models. ArXiv preprint, abs/2302.00083, 2023. URL https://arxiv.org/abs/2302.00083.
- Hierarchical text-conditional image generation with clip latents. ArXiv preprint, abs/2204.06125, 2022. URL https://arxiv.org/abs/2204.06125.
- Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China, 2019. Association for Computational Linguistics. 10.18653/v1/D19-1410. URL https://aclanthology.org/D19-1410.
- Noam Shazeer. Fast transformer decoding: One write-head is all you need. ArXiv preprint, abs/1911.02150, 2019. URL https://arxiv.org/abs/1911.02150.
- Replug: Retrieval-augmented black-box language models. ArXiv preprint, abs/2301.12652, 2023. URL https://arxiv.org/abs/2301.12652.
- Is chatgpt good at search? investigating large language models as re-ranking agent. ArXiv preprint, abs/2304.09542, 2023. URL https://arxiv.org/abs/2304.09542.
- Transformer memory as a differentiable search index. Advances in Neural Information Processing Systems, 35:21831–21843, 2022.
- Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models. ArXiv preprint, abs/2104.08663, 2021. URL https://arxiv.org/abs/2104.08663.
- Llama: Open and efficient foundation language models. ArXiv preprint, abs/2302.13971, 2023. URL https://arxiv.org/abs/2302.13971.
- Text embeddings by weakly-supervised contrastive pre-training. ArXiv preprint, abs/2212.03533, 2022. URL https://arxiv.org/abs/2212.03533.
- SimLM: Pre-training with representation bottleneck for dense passage retrieval. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2244–2258, Toronto, Canada, July 2023a. Association for Computational Linguistics. 10.18653/v1/2023.acl-long.125. URL https://aclanthology.org/2023.acl-long.125.
- Query2doc: Query expansion with large language models. ArXiv preprint, abs/2303.07678, 2023b. URL https://arxiv.org/abs/2303.07678.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Approximate nearest neighbor negative contrastive learning for dense text retrieval. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=zeFrfgyZln.
- Inference with reference: Lossless acceleration of large language models. ArXiv preprint, abs/2304.04487, 2023. URL https://arxiv.org/abs/2304.04487.
- Training language models with memory augmentation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5657–5673, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.382.
- Pose: Efficient context window extension of llms via positional skip-wise training. ArXiv preprint, abs/2309.10400, 2023. URL https://arxiv.org/abs/2309.10400.
- Liang Wang (513 papers)
- Nan Yang (183 papers)
- Xiaolong Huang (29 papers)
- Linjun Yang (16 papers)
- Rangan Majumder (12 papers)
- Furu Wei (292 papers)