Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FIT-RAG: Black-Box RAG with Factual Information and Token Reduction (2403.14374v1)

Published 21 Mar 2024 in cs.CL and cs.IR
FIT-RAG: Black-Box RAG with Factual Information and Token Reduction

Abstract: Due to the extraordinarily large number of parameters, fine-tuning LLMs to update long-tail or out-of-date knowledge is impractical in lots of applications. To avoid fine-tuning, we can alternatively treat a LLM as a black-box (i.e., freeze the parameters of the LLM) and augment it with a Retrieval-Augmented Generation (RAG) system, namely black-box RAG. Recently, black-box RAG has achieved success in knowledge-intensive tasks and has gained much attention. Existing black-box RAG methods typically fine-tune the retriever to cater to LLMs' preferences and concatenate all the retrieved documents as the input, which suffers from two issues: (1) Ignorance of Factual Information. The LLM preferred documents may not contain the factual information for the given question, which can mislead the retriever and hurt the effectiveness of black-box RAG; (2) Waste of Tokens. Simply concatenating all the retrieved documents brings large amounts of unnecessary tokens for LLMs, which degenerates the efficiency of black-box RAG. To address these issues, this paper proposes a novel black-box RAG framework which utilizes the factual information in the retrieval and reduces the number of tokens for augmentation, dubbed FIT-RAG. FIT-RAG utilizes the factual information by constructing a bi-label document scorer. Besides, it reduces the tokens by introducing a self-knowledge recognizer and a sub-document-level token reducer. FIT-RAG achieves both superior effectiveness and efficiency, which is validated by extensive experiments across three open-domain question-answering datasets: TriviaQA, NQ and PopQA. FIT-RAG can improve the answering accuracy of Llama2-13B-Chat by 14.3\% on TriviaQA, 19.9\% on NQ and 27.5\% on PopQA, respectively. Furthermore, it can save approximately half of the tokens on average across the three datasets.

The paper "FIT-RAG: Black-Box RAG with Factual Information and Token Reduction" addresses significant challenges in the field of knowledge-intensive tasks utilizing LLMs. Fine-tuning LLMs to update long-tail or out-of-date knowledge often proves impractical due to the sheer number of parameters involved. Instead, the authors advocate treating LLMs as black-boxes (i.e., freezing their parameters) and augmenting them with a Retrieval-Augmented Generation (RAG) system, a method known as black-box RAG.

However, existing black-box RAG methodologies face two notable issues:

  1. Ignorance of Factual Information: Existing systems typically fine-tune the retriever to match LLMs' preferences, which may not always align with factual accuracy, potentially misleading the retriever and diminishing the efficacy of RAG.
  2. Waste of Tokens: By concatenating all retrieved documents into the input, many unnecessary tokens are included, reducing efficiency.

To mitigate these issues, the authors propose a novel black-box RAG framework named FIT-RAG. This framework introduces two main strategies:

  1. Bi-label Document Scorer: This mechanism utilizes factual information to improve the retrieval process. It ensures that the documents selected for augmentation align with the factual requirements of the given question.
  2. Self-Knowledge Recognizer and Sub-document-level Token Reducer: This component aims to minimize token waste by intelligently reducing the number of tokens used for augmentation. It recognizes essential information within the retrieved documents and selectively reduces token usage.

The efficacy and efficiency of FIT-RAG are validated through extensive experiments on three open-domain question-answering datasets: TriviaQA, NQ, and PopQA. The results are notable, with FIT-RAG significantly improving the answering accuracy of Llama2-13B-Chat:

  • 14.3% increase on TriviaQA
  • 19.9% increase on NQ
  • 27.5% increase on PopQA

Moreover, FIT-RAG demonstrates substantial improvements in efficiency, reducing token usage by approximately half across the three datasets.

In summary, FIT-RAG represents a significant advancement in black-box RAG systems, offering both enhanced accuracy and efficiency by addressing the fundamental issues of factual information inclusion and token waste reduction.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yuren Mao (17 papers)
  2. Xuemei Dong (4 papers)
  3. Wenyi Xu (6 papers)
  4. Yunjun Gao (67 papers)
  5. Bin Wei (25 papers)
  6. Ying Zhang (388 papers)
Citations (4)