Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs (2402.12052v3)

Published 19 Feb 2024 in cs.CL

Abstract: The integration of LLMs and search engines represents a significant evolution in knowledge acquisition methodologies. However, determining the knowledge that an LLM already possesses and the knowledge that requires the help of a search engine remains an unresolved issue. Most existing methods solve this problem through the results of preliminary answers or reasoning done by the LLM itself, but this incurs excessively high computational costs. This paper introduces a novel collaborative approach, namely SlimPLM, that detects missing knowledge in LLMs with a slim proxy model, to enhance the LLM's knowledge acquisition process. We employ a proxy model which has far fewer parameters, and take its answers as heuristic answers. Heuristic answers are then utilized to predict the knowledge required to answer the user question, as well as the known and unknown knowledge within the LLM. We only conduct retrieval for the missing knowledge in questions that the LLM does not know. Extensive experimental results on five datasets with two LLMs demonstrate a notable improvement in the end-to-end performance of LLMs in question-answering tasks, achieving or surpassing current state-of-the-art models with lower LLM inference costs.

Leveraging Slim Proxy Models for Efficient Knowledge Retrieval in LLMs

Introduction to SlimPLM

In recent research, a novel method named SlimPLM (Slim Proxy LLM) is introduced for enhancing the functionality of LLMs through efficient and targeted knowledge retrieval. SlimPLM proposes a unique approach by incorporating a smaller proxy model to decide when and what external knowledge should be retrieved to answer a user's question. This alleviates the need for unnecessary retrieval, thus reducing computational costs while maintaining or even surpassing the state-of-the-art performance in question-answering tasks across various datasets.

The Challenge with Existing Methods

The integration of LLMs with retrieval systems has been a significant advancement in improving the quality of generated text by LLMs. However, determining the necessity and timing for retrieval poses challenges, including increased computational demands and potential degradation in performance due to irrelevant information retrieval. SlimPLM aims to address these challenges by using a proxy model, thereby offering an efficient solution.

SlimPLM Methodology

  • Proxy Model Utilization: The core of SlimPLM is its leveraging of a smaller model as a heuristic tool. This "proxy model" processes user questions to predict the LLM's existing knowledge and identifies gaps that require external retrieval.
  • Retrieval Necessity and Query Rewriting: SlimPLM enhances decision-making for retrieval by assessing the heuristic answer's quality produced by the proxy model. A low-quality heuristic answer triggers a retrieval process, where the proxy's output also aids in generating refined queries, ensuring relevant knowledge is sought.
  • Experimental Validation: Conducted across five question-answering datasets, SlimPLM demonstrably enhanced LLMs' performance in various natural language processing tasks. Notably, it maintained or improved upon current models' achievements with significantly lower inference costs.

Contributions and Implications

SlimPLM's introduction marks a significant step toward optimizing knowledge retrieval in LLMs, contributing to:

  • Efficient determination of retrieval necessity via a smaller proxy model, minimizing unwarranted external searches.
  • Enhanced relevance in retrieval through heuristic answer-based refined query generation.
  • Notable performance improvements in end-to-end question-answering tasks without increasing computational overhead.

The practical implications of this approach are profound, especially in environments where computational resources are limited, and the need for precise, efficient retrieval is paramount. The ability of SlimPLM to judiciously decide when to engage in retrieval and what information to seek can lead to more sustainable and cost-effective deployments of advanced LLMs.

Future Directions

Looking forward, the application of SlimPLM opens new avenues for research, particularly in refining the proxy model's accuracy and expanding the method's adaptability across various domains. Furthermore, simplifying the current three-model pipeline into a more integrated system could further enhance efficiency and applicability. The exploration of proxy models in different sizes and configurations also presents an exciting area for further investigation, potentially leading to more tailored and domain-specific solutions.

Conclusion

SlimPLM's innovative approach to utilizing slim proxy models for effective retrieval represents a significant stride in the ongoing evolution of LLMs. By intelligently determining the need for external knowledge retrieval, SlimPLM not only optimizes the performance of LLMs in rendering accurate answers but also significantly reduces computational demands. As such, SlimPLM establishes a promising foundation for future endeavors aiming to harmonize efficiency with performance in the field of generative AI and LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Gpt-4 technical report.
  2. The falcon series of open language models. ArXiv, abs/2311.16867.
  3. Self-rag: Learning to retrieve, generate, and critique through self-reflection. ArXiv, abs/2310.11511.
  4. Qwen technical report. ArXiv, abs/2309.16609.
  5. Scaling instruction-finetuned language models. ArXiv, abs/2210.11416.
  6. Chain-of-verification reduces hallucination in large language models. ArXiv, abs/2309.11495.
  7. Eli5: Long form question answering. ArXiv, abs/1907.09190.
  8. The pile: An 800gb dataset of diverse text for language modeling. ArXiv, abs/2101.00027.
  9. Precise zero-shot dense retrieval without relevance labels. ArXiv, abs/2212.10496.
  10. On calibration of modern neural networks. In International Conference on Machine Learning.
  11. Realm: Retrieval-augmented language model pre-training. ArXiv, abs/2002.08909.
  12. Rethinking with retrieval: Faithful large language model inference. ArXiv, abs/2301.00303.
  13. Chip Huyen. 2019. Evaluation metrics for language modeling. The Gradient.
  14. Gautier Izacard and Edouard Grave. 2020. Leveraging passage retrieval with generative models for open domain question answering. ArXiv, abs/2007.01282.
  15. How can we know when language models know? on the calibration of language models for question answering. Transactions of the Association for Computational Linguistics, 9:962–977.
  16. Active retrieval augmented generation. ArXiv, abs/2305.06983.
  17. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. ArXiv, abs/1705.03551.
  18. Language models (mostly) know what they know. ArXiv, abs/2207.05221.
  19. Scaling laws for neural language models. ArXiv, abs/2001.08361.
  20. Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp. ArXiv, abs/2212.14024.
  21. Understanding catastrophic forgetting in language models via implicit inference. ArXiv, abs/2309.10105.
  22. Evaluating the factual consistency of abstractive text summarization. In Conference on Empirical Methods in Natural Language Processing.
  23. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466.
  24. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Annual Meeting of the Association for Computational Linguistics.
  25. Retrieval-augmented generation for knowledge-intensive nlp tasks. ArXiv, abs/2005.11401.
  26. Textbooks are all you need ii: phi-1.5 technical report. ArXiv, abs/2309.05463.
  27. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Annual Meeting of the Association for Computational Linguistics.
  28. Teaching models to express their uncertainty in words. Trans. Mach. Learn. Res., 2022.
  29. What makes good in-context examples for gpt-3? In Workshop on Knowledge Extraction and Integration for Deep Learning Architectures; Deep Learning Inside Out.
  30. Reta-llm: A retrieval-augmented large language model toolkit. ArXiv, abs/2306.05212.
  31. Webglm: Towards an efficient web-enhanced question answering system with human preferences. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
  32. Query rewriting for retrieval-augmented large language models. ArXiv, abs/2305.14283.
  33. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Annual Meeting of the Association for Computational Linguistics.
  34. On faithfulness and factuality in abstractive summarization. ArXiv, abs/2005.00661.
  35. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. ArXiv, abs/2305.14251.
  36. Rethinking the role of demonstrations: What makes in-context learning work? ArXiv, abs/2202.12837.
  37. Webgpt: Browser-assisted question-answering with human feedback. ArXiv, abs/2112.09332.
  38. Text and code embeddings by contrastive pre-training. ArXiv, abs/2201.10005.
  39. Training language models to follow instructions with human feedback. ArXiv, abs/2203.02155.
  40. The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only. ArXiv, abs/2306.01116.
  41. Check your facts and try again: Improving large language models with external knowledge and automated feedback. ArXiv, abs/2302.12813.
  42. How context affects language models’ factual predictions. ArXiv, abs/2005.04611.
  43. Kilt: a benchmark for knowledge intensive language tasks. In North American Chapter of the Association for Computational Linguistics.
  44. Measuring and narrowing the compositionality gap in language models. ArXiv, abs/2210.03350.
  45. Webcpm: Interactive web search for chinese long-form question answering. In Annual Meeting of the Association for Computational Linguistics.
  46. Tool learning with foundation models. ArXiv, abs/2304.08354.
  47. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67.
  48. In-context retrieval-augmented language models. Transactions of the Association for Computational Linguistics, 11:1316–1331.
  49. Stephen E. Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3:333–389.
  50. Learning to retrieve prompts for in-context learning. ArXiv, abs/2112.08633.
  51. Bloom: A 176b-parameter open-access multilingual language model. ArXiv, abs/2211.05100.
  52. Toolformer: Language models can teach themselves to use tools. ArXiv, abs/2302.04761.
  53. Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. ArXiv, abs/2305.15294.
  54. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning.
  55. Replug: Retrieval-augmented black-box language models. ArXiv, abs/2301.12652.
  56. Retrieval augmentation reduces hallucination in conversation. In Conference on Empirical Methods in Natural Language Processing.
  57. Asqa: Factoid questions meet long-form answers. ArXiv, abs/2204.06092.
  58. Recitation-augmented language models. ArXiv, abs/2210.01296.
  59. Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
  60. Musique: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics, 10:539–554.
  61. Text embeddings by weakly-supervised contrastive pre-training. ArXiv, abs/2212.03533.
  62. Query2doc: Query expansion with large language models. In Conference on Empirical Methods in Natural Language Processing.
  63. Self-knowledge guided retrieval augmentation for large language models. ArXiv, abs/2310.05002.
  64. Chain of thought prompting elicits reasoning in large language models. ArXiv, abs/2201.11903.
  65. Ryen W. White. 2023. Navigating complex search tasks with ai copilots. ArXiv, abs/2311.01235.
  66. Recomp: Improving retrieval-augmented lms with compression and selective augmentation. ArXiv, abs/2310.04408.
  67. React: Synergizing reasoning and acting in language models. ArXiv, abs/2210.03629.
  68. Making retrieval-augmented language models robust to irrelevant context. ArXiv, abs/2310.01558.
  69. Chain-of-note: Enhancing robustness in retrieval-augmented language models. ArXiv, abs/2311.09210.
  70. Improving language models via plug-and-play retrieval feedback. ArXiv, abs/2305.14002.
  71. Investigating the catastrophic forgetting in multimodal large language models. ArXiv, abs/2309.10313.
  72. Tinyllama: An open-source small language model. ArXiv, abs/2401.02385.
  73. A survey of large language models. ArXiv, abs/2303.18223.
  74. Detecting hallucinated content in conditional neural sequence generation. ArXiv, abs/2011.02593.
  75. Large language models for information retrieval: A survey. ArXiv, abs/2308.07107.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jiejun Tan (5 papers)
  2. Zhicheng Dou (113 papers)
  3. Yutao Zhu (63 papers)
  4. Peidong Guo (3 papers)
  5. Kun Fang (93 papers)
  6. Ji-Rong Wen (299 papers)
Citations (15)
X Twitter Logo Streamline Icon: https://streamlinehq.com