Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 84 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 96 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Kimi K2 189 tok/s Pro
2000 character limit reached

AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models (2409.01579v1)

Published 3 Sep 2024 in cs.CL and cs.AI

Abstract: Retrieved documents containing noise will hinder RAG from detecting answer clues and make the inference process slow and expensive. Therefore, context compression is necessary to enhance its accuracy and efficiency. Existing context compression methods use extractive or generative models to retain the most query-relevant sentences or apply the information bottleneck theory to preserve sufficient information. However, these methods may face issues such as over-compression or high computational costs. We observe that the retriever often ranks relevant documents at the top, but the exact number of documents needed to answer the query is uncertain due to the impact of query complexity and retrieval quality: complex queries like multi-hop questions may require retaining more documents than simpler queries, and a low-quality retrieval may need to rely on more documents to generate accurate outputs. Therefore, determining the minimum number of required documents (compression rate) is still a challenge for RAG. In this paper, we introduce AdaComp, a low-cost extractive context compression method that adaptively determines the compression rate based on both query complexity and retrieval quality. Specifically, we first annotate the minimum top-k documents necessary for the RAG system to answer the current query as the compression rate and then construct triplets of the query, retrieved documents, and its compression rate. Then, we use this triplet dataset to train a compression-rate predictor. Experiments on three QA datasets and one conversational Muiti-doc QA dataset show that AdaComp significantly reduces inference costs while maintaining performance nearly identical to uncompressed models, achieving a balance between efficiency and performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. GERE: Generative evidence retrieval for fact verification. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2184–2189.
  2. Adapting language models to compress contexts. arXiv preprint arXiv:2305.14788.
  3. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
  4. In-context autoencoder for context compression in a large language model. arXiv preprint arXiv:2307.06945.
  5. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  6. Learning Retrieval Augmentation for Personalized Dialogue Generation. In The 2023 Conference on Empirical Methods in Natural Language Processing.
  7. Llmlingua: Compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736.
  8. Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. arXiv preprint arXiv:2310.06839.
  9. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551.
  10. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906.
  11. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7: 453–466.
  12. Compressing context to enhance inference efficiency of large language models. arXiv preprint arXiv:2310.06201.
  13. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12: 157–173.
  14. Generation-Augmented Retrieval for Open-Domain Question Answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 4089–4100.
  15. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  16. Learning to filter context for retrieval-augmented generation. arXiv preprint arXiv:2311.08377.
  17. Recomp: Improving retrieval-augmented lms with compression and selective augmentation. arXiv preprint arXiv:2310.04408.
  18. HotpotQA: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600.
  19. An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation. arXiv preprint arXiv:2406.01549.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.