Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs (2402.12052v3)
Abstract: The integration of LLMs and search engines represents a significant evolution in knowledge acquisition methodologies. However, determining the knowledge that an LLM already possesses and the knowledge that requires the help of a search engine remains an unresolved issue. Most existing methods solve this problem through the results of preliminary answers or reasoning done by the LLM itself, but this incurs excessively high computational costs. This paper introduces a novel collaborative approach, namely SlimPLM, that detects missing knowledge in LLMs with a slim proxy model, to enhance the LLM's knowledge acquisition process. We employ a proxy model which has far fewer parameters, and take its answers as heuristic answers. Heuristic answers are then utilized to predict the knowledge required to answer the user question, as well as the known and unknown knowledge within the LLM. We only conduct retrieval for the missing knowledge in questions that the LLM does not know. Extensive experimental results on five datasets with two LLMs demonstrate a notable improvement in the end-to-end performance of LLMs in question-answering tasks, achieving or surpassing current state-of-the-art models with lower LLM inference costs.
- Gpt-4 technical report.
- The falcon series of open language models. ArXiv, abs/2311.16867.
- Self-rag: Learning to retrieve, generate, and critique through self-reflection. ArXiv, abs/2310.11511.
- Qwen technical report. ArXiv, abs/2309.16609.
- Scaling instruction-finetuned language models. ArXiv, abs/2210.11416.
- Chain-of-verification reduces hallucination in large language models. ArXiv, abs/2309.11495.
- Eli5: Long form question answering. ArXiv, abs/1907.09190.
- The pile: An 800gb dataset of diverse text for language modeling. ArXiv, abs/2101.00027.
- Precise zero-shot dense retrieval without relevance labels. ArXiv, abs/2212.10496.
- On calibration of modern neural networks. In International Conference on Machine Learning.
- Realm: Retrieval-augmented language model pre-training. ArXiv, abs/2002.08909.
- Rethinking with retrieval: Faithful large language model inference. ArXiv, abs/2301.00303.
- Chip Huyen. 2019. Evaluation metrics for language modeling. The Gradient.
- Gautier Izacard and Edouard Grave. 2020. Leveraging passage retrieval with generative models for open domain question answering. ArXiv, abs/2007.01282.
- How can we know when language models know? on the calibration of language models for question answering. Transactions of the Association for Computational Linguistics, 9:962–977.
- Active retrieval augmented generation. ArXiv, abs/2305.06983.
- Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. ArXiv, abs/1705.03551.
- Language models (mostly) know what they know. ArXiv, abs/2207.05221.
- Scaling laws for neural language models. ArXiv, abs/2001.08361.
- Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp. ArXiv, abs/2212.14024.
- Understanding catastrophic forgetting in language models via implicit inference. ArXiv, abs/2309.10105.
- Evaluating the factual consistency of abstractive text summarization. In Conference on Empirical Methods in Natural Language Processing.
- Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Annual Meeting of the Association for Computational Linguistics.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. ArXiv, abs/2005.11401.
- Textbooks are all you need ii: phi-1.5 technical report. ArXiv, abs/2309.05463.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Annual Meeting of the Association for Computational Linguistics.
- Teaching models to express their uncertainty in words. Trans. Mach. Learn. Res., 2022.
- What makes good in-context examples for gpt-3? In Workshop on Knowledge Extraction and Integration for Deep Learning Architectures; Deep Learning Inside Out.
- Reta-llm: A retrieval-augmented large language model toolkit. ArXiv, abs/2306.05212.
- Webglm: Towards an efficient web-enhanced question answering system with human preferences. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
- Query rewriting for retrieval-augmented large language models. ArXiv, abs/2305.14283.
- When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Annual Meeting of the Association for Computational Linguistics.
- On faithfulness and factuality in abstractive summarization. ArXiv, abs/2005.00661.
- Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. ArXiv, abs/2305.14251.
- Rethinking the role of demonstrations: What makes in-context learning work? ArXiv, abs/2202.12837.
- Webgpt: Browser-assisted question-answering with human feedback. ArXiv, abs/2112.09332.
- Text and code embeddings by contrastive pre-training. ArXiv, abs/2201.10005.
- Training language models to follow instructions with human feedback. ArXiv, abs/2203.02155.
- The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only. ArXiv, abs/2306.01116.
- Check your facts and try again: Improving large language models with external knowledge and automated feedback. ArXiv, abs/2302.12813.
- How context affects language models’ factual predictions. ArXiv, abs/2005.04611.
- Kilt: a benchmark for knowledge intensive language tasks. In North American Chapter of the Association for Computational Linguistics.
- Measuring and narrowing the compositionality gap in language models. ArXiv, abs/2210.03350.
- Webcpm: Interactive web search for chinese long-form question answering. In Annual Meeting of the Association for Computational Linguistics.
- Tool learning with foundation models. ArXiv, abs/2304.08354.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67.
- In-context retrieval-augmented language models. Transactions of the Association for Computational Linguistics, 11:1316–1331.
- Stephen E. Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3:333–389.
- Learning to retrieve prompts for in-context learning. ArXiv, abs/2112.08633.
- Bloom: A 176b-parameter open-access multilingual language model. ArXiv, abs/2211.05100.
- Toolformer: Language models can teach themselves to use tools. ArXiv, abs/2302.04761.
- Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. ArXiv, abs/2305.15294.
- Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning.
- Replug: Retrieval-augmented black-box language models. ArXiv, abs/2301.12652.
- Retrieval augmentation reduces hallucination in conversation. In Conference on Empirical Methods in Natural Language Processing.
- Asqa: Factoid questions meet long-form answers. ArXiv, abs/2204.06092.
- Recitation-augmented language models. ArXiv, abs/2210.01296.
- Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
- Musique: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics, 10:539–554.
- Text embeddings by weakly-supervised contrastive pre-training. ArXiv, abs/2212.03533.
- Query2doc: Query expansion with large language models. In Conference on Empirical Methods in Natural Language Processing.
- Self-knowledge guided retrieval augmentation for large language models. ArXiv, abs/2310.05002.
- Chain of thought prompting elicits reasoning in large language models. ArXiv, abs/2201.11903.
- Ryen W. White. 2023. Navigating complex search tasks with ai copilots. ArXiv, abs/2311.01235.
- Recomp: Improving retrieval-augmented lms with compression and selective augmentation. ArXiv, abs/2310.04408.
- React: Synergizing reasoning and acting in language models. ArXiv, abs/2210.03629.
- Making retrieval-augmented language models robust to irrelevant context. ArXiv, abs/2310.01558.
- Chain-of-note: Enhancing robustness in retrieval-augmented language models. ArXiv, abs/2311.09210.
- Improving language models via plug-and-play retrieval feedback. ArXiv, abs/2305.14002.
- Investigating the catastrophic forgetting in multimodal large language models. ArXiv, abs/2309.10313.
- Tinyllama: An open-source small language model. ArXiv, abs/2401.02385.
- A survey of large language models. ArXiv, abs/2303.18223.
- Detecting hallucinated content in conditional neural sequence generation. ArXiv, abs/2011.02593.
- Large language models for information retrieval: A survey. ArXiv, abs/2308.07107.
- Jiejun Tan (5 papers)
- Zhicheng Dou (113 papers)
- Yutao Zhu (63 papers)
- Peidong Guo (3 papers)
- Kun Fang (93 papers)
- Ji-Rong Wen (299 papers)