Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
101 tokens/sec
GPT-4o
13 tokens/sec
Gemini 2.5 Pro Pro
37 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
4 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Compressing Lengthy Context With UltraGist (2405.16635v2)

Published 26 May 2024 in cs.CL

Abstract: Compressing lengthy context is a critical but technically challenging problem. In this paper, we propose a new method called UltraGist, which is distinguished for its high-quality compression of lengthy context due to the innovative design of the compression and learning algorithm. UltraGist brings forth the following important benefits. Firstly, it notably contributes to the flexibility of compression, as it can be effectively learned to support a broad range of context lengths and compression ratios. Secondly, it helps to produce fine-grained compression for the lengthy context, where each small segment of the context is progressively processed on top of a tailored cross-attention mechanism. Thirdly, it makes the training process sample-efficient and thus maximizes the use of training data. Finally, it facilitates the efficient running of compression for dynamic context, as the compression result can be progressively generated and hence incrementally updated. UltraGist is evaluated on a wide variety of tasks associated with lengthy context, such as document QA and summarization, few-shot learning, multi-session conversation, et al. Whilst the existing methods fail to handle these challenging scenarios, our approach is able to preserve a near-lossless compression performance throughout all the evaluations. Our data, model, and code have been released at \url{https://github.com/namespace-Pt/UltraGist}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Ntk-aware scaled rope, 2023.
  2. Sharegpt, 2023.
  3. Longbench: A bilingual, multitask benchmark for long context understanding. arXiv preprint arXiv:2308.14508, 2023.
  4. Scaling transformer to 1m tokens and beyond with RMT. CoRR, abs/2304.11062, 2023.
  5. Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595, 2023.
  6. Longlora: Efficient fine-tuning of long-context large language models. arXiv preprint arXiv:2309.12307, 2023.
  7. Adapting language models to compress contexts. In H. Bouamor, J. Pino, and K. Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 3829–3846. Association for Computational Linguistics, 2023.
  8. How long can open-source llms truly promise on context length?, June 2023.
  9. Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models, 2024.
  10. T. Dao. Flashattention-2: Faster attention with better parallelism and work partitioning. CoRR, abs/2307.08691, 2023.
  11. A dataset of information-seeking questions and answers anchored in research papers, 2021.
  12. Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model. In A. Korhonen, D. Traum, and L. Màrquez, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1074–1084, Florence, Italy, July 2019. Association for Computational Linguistics.
  13. Data engineering for scaling language models to 128k context, 2024.
  14. The pile: An 800gb dataset of diverse text for language modeling, 2020.
  15. In-context autoencoder for context compression in a large language model, 2024.
  16. Samsum corpus: A human-annotated dialogue dataset for abstractive summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization. Association for Computational Linguistics, 2019.
  17. Longcoder: A long-range pre-trained language model for code completion, 2023.
  18. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  19. Efficient attentions for long document summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1419–1436, Online, June 2021. Association for Computational Linguistics.
  20. Mistral 7b, 2023.
  21. Llmlingua: Compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736, 2023.
  22. Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression, 2023.
  23. Compressed context memory for online language model interaction, 2024.
  24. The narrativeqa reading comprehension challenge, 2017.
  25. Booksum: A collection of datasets for long-form narrative summarization, 2022.
  26. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning, pages 19730–19742. PMLR, 2023.
  27. X. Li and D. Roth. Learning question classifiers. In COLING 2002: The 19th International Conference on Computational Linguistics, 2002.
  28. Repobench: Benchmarking repository-level code auto-completion systems, 2023.
  29. C. P. Michael Gschwind, Driss Guessous. Accelerated pytorch 2 transformers. https://pytorch.org/blog/accelerated-pytorch-2/, 2023.
  30. Learning to compress prompts with gist tokens. CoRR, abs/2304.08467, 2023.
  31. Memgpt: Towards llms as operating systems, 2024.
  32. Yarn: Efficient context window extension of large language models. arXiv preprint arXiv:2309.00071, 2023.
  33. Train short, test long: Attention with linear biases enables input length extrapolation. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
  34. SlimPajama: A 627B token cleaned and deduplicated version of RedPajama. https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajama, June 2023.
  35. J. Su. Rectified rotary position embeddings. https://github.com/bojone/rerope, 2023.
  36. Roformer: Enhanced transformer with rotary position embedding. CoRR, abs/2104.09864, 2021.
  37. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  38. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  39. Musique: Multihop questions via single-hop question composition, 2022.
  40. Hotpotqa: A dataset for diverse, explainable multi-hop question answering, 2018.
  41. Pose: Efficient context window extension of llms via positional skip-wise training. CoRR, abs/2309.10400, 2023.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets