Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 69 tok/s
Gemini 2.5 Pro 58 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 461 tok/s Pro
Claude Sonnet 4.5 33 tok/s Pro
2000 character limit reached

StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses (2403.08312v3)

Published 13 Mar 2024 in cs.CL and cs.AI

Abstract: Standard LLMs struggle with handling dialogues with long contexts due to efficiency and consistency issues. According to our observation, dialogue contexts are highly structured, and the special token of \textit{End-of-Utterance} (EoU) in dialogues has the potential to aggregate information. We refer to the EoU tokens as ``conversational attention sinks'' (conv-attn sinks). Accordingly, we introduce StreamingDialogue, which compresses long dialogue history into conv-attn sinks with minimal losses, and thus reduces computational complexity quadratically with the number of sinks (i.e., the number of utterances). Current LLMs already demonstrate the ability to handle long context window, e.g., a window size of 200K or more. To this end, by compressing utterances into EoUs, our method has the potential to handle more than 200K of utterances, resulting in a prolonged dialogue learning. In order to minimize information losses from reconstruction after compression, we design two learning strategies of short-memory reconstruction (SMR) and long-memory reactivation (LMR). Our method outperforms strong baselines in dialogue tasks and achieves a 4 $\times$ speedup while reducing memory usage by 18 $\times$ compared to dense attention recomputation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Palm 2 technical report. arXiv preprint arXiv:2305.10403.
  3. Qwen technical report. arXiv preprint arXiv:2309.16609.
  4. Björn Bebensee and Haejun Lee. 2023. Span-selective linear attention transformers for effective and robust schema-guided dialogue state tracking. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 78–91, Toronto, Canada. Association for Computational Linguistics.
  5. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150.
  6. Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595.
  7. Fortify the shortest stave in attention: Enhancing context awareness of large language models for effective tool use. arXiv preprint arXiv:2312.04455.
  8. Longlora: Efficient fine-tuning of long-context large language models. arXiv preprint arXiv:2309.12307.
  9. Transformer-xl: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2978–2988.
  10. Tri Dao. 2023. Flashattention-2: Faster attention with better parallelism and work partitioning. arXiv preprint arXiv:2307.08691.
  11. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 35:16344–16359.
  12. Advancing transformer architecture in long-context large language models: A comprehensive survey. arXiv preprint arXiv:2311.12351.
  13. Dialogue state tracking with a language model using schema-driven prompting. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4937–4949, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  14. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 110–119, San Diego, California. Association for Computational Linguistics.
  15. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  16. DialoGPS: Dialogue path sampling in continuous semantic space for data augmentation in multi-turn conversations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1267–1280, Toronto, Canada. Association for Computational Linguistics.
  17. ∞\infty∞-former: Infinite memory transformer. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5468–5485, Dublin, Ireland. Association for Computational Linguistics.
  18. Memgpt: Towards llms as operating systems. arXiv preprint arXiv:2310.08560.
  19. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  20. Yarn: Efficient context window extension of large language models. arXiv preprint arXiv:2309.00071.
  21. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  22. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
  23. Randomized positional encodings boost length generalization of transformers. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1889–1903, Toronto, Canada. Association for Computational Linguistics.
  24. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568:127063.
  25. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  26. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  27. MISC: A mixed strategy-aware model integrating COMET for emotional support conversation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 308–319, Dublin, Ireland. Association for Computational Linguistics.
  28. Attention is all you need. Advances in neural information processing systems, 30.
  29. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768.
  30. Qingyang Wu and Zhou Yu. 2022. Stateful memory-augmented transformers for dialogue modeling. arXiv preprint arXiv:2209.07634.
  31. Efficient streaming language models with attention sinks. arXiv preprint arXiv:2309.17453.
  32. Beyond goldfish memory: Long-term open-domain conversation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5180–5197, Dublin, Ireland. Association for Computational Linguistics.
  33. Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305.
  34. Bp-transformer: Modelling long-range context via binary partitioning. arXiv preprint arXiv:1911.04070.
  35. Big bird: Transformers for longer sequences. Advances in neural information processing systems, 33:17283–17297.
  36. Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2204–2213, Melbourne, Australia. Association for Computational Linguistics.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.