Are Long-LLMs A Necessity For Long-Context Tasks? (2405.15318v1)
Abstract: The learning and deployment of long-LLMs remains a challenging problem despite recent progresses. In this work, we argue that the long-LLMs are not a necessity to solve long-context tasks, as common long-context tasks are short-context solvable, i.e. they can be solved by purely working with oracle short-contexts within the long-context tasks' inputs. On top of this argument, we propose a framework called LC-Boost (Long-Context Bootstrapper), which enables a short-LLM to address the long-context tasks in a bootstrapping manner. In our framework, the short-LLM prompts itself to reason for two critical decisions: 1) how to access to the appropriate part of context within the input, 2) how to make effective use of the accessed context. By adaptively accessing and utilizing the context based on the presented tasks, LC-Boost can serve as a general framework to handle diversified long-context processing problems. We comprehensively evaluate different types of tasks from popular long-context benchmarks, where LC-Boost is able to achieve a substantially improved performance with a much smaller consumption of resource.
- Longbench: A bilingual, multitask benchmark for long context understanding. arXiv preprint arXiv:2308.14508, 2023.
- ∞\infty∞bench: Extending long context evaluation beyond 100k tokens, 2024a.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023a.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
- Scaling laws for neural language models, 2020.
- Lost in the middle: How language models use long contexts, 2023.
- How long can context length of open-source llms truly promise? In NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following, 2023a.
- Ralph Adolphs. Social cognition and the human brain. Trends in cognitive sciences, 3(12):469–479, 1999.
- Computer systems: a programmer’s perspective. Prentice Hall, 2011.
- Retrieval meets Long Context Large Language Models. arXiv, 2023a. doi: 10.48550/arxiv.2310.03025. Experimental.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.
- Retrieval-Augmented Generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459–9474, 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf.
- Retrieval augmentation reduces hallucination in conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3784–3803, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-emnlp.320. URL https://aclanthology.org/2021.findings-emnlp.320.
- Thomas M Cover. Elements of information theory. John Wiley & Sons, 1999.
- Deep learning and the information bottleneck principle, 2015.
- Foundations of machine learning. MIT press, 2018.
- The narrativeqa reading comprehension challenge, 2017.
- A dataset of information-seeking questions and answers anchored in research papers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4599–4610, 2021.
- Hotpotqa: A dataset for diverse, explainable multi-hop question answering, 2018.
- Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps. In Donia Scott, Nuria Bel, and Chengqing Zong, editors, Proceedings of the 28th International Conference on Computational Linguistics, pages 6609–6625, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. doi: 10.18653/v1/2020.coling-main.580. URL https://aclanthology.org/2020.coling-main.580.
- Musique: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics, 10:539–554, 2022.
- Efficient attentions for long document summarization. In Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou, editors, Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1419–1436, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.112. URL https://aclanthology.org/2021.naacl-main.112.
- Multi-news: a large-scale multi-document summarization dataset and abstractive hierarchical model, 2019.
- SAMSum corpus: A human-annotated dialogue dataset for abstractive summarization. In Lu Wang, Jackie Chi Kit Cheung, Giuseppe Carenini, and Fei Liu, editors, Proceedings of the 2nd Workshop on New Frontiers in Summarization, pages 70–79, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-5409. URL https://aclanthology.org/D19-5409.
- Longcoder: A long-range pre-trained language model for code completion, 2023.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
- How long can open-source llms truly promise on context length?, June 2023b. URL https://lmsys.org/blog/2023-06-29-longchat.
- Mistral 7b. arXiv preprint arXiv:2310.06825, 2023a.
- Extending llama-3’s context ten-fold overnight, 2024b.
- Phi-3 technical report: A highly capable language model locally on your phone, 2024.
- Yi: Open foundation models by 01.ai, 2024.
- DeepSeek-AI. Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model, 2024.
- Yarn: Efficient context window extension of large language models. In The Twelfth International Conference on Learning Representations, 2023.
- Data engineering for scaling language models to 128k context, 2024.
- Internlm2 technical report. arXiv preprint arXiv:2403.17297, 2024.
- Longlora: Efficient fine-tuning of long-context large language models. In The Twelfth International Conference on Learning Representations, 2023a.
- Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, 2022.
- OpenAI. Gpt-4 technical report. https://cdn.openai.com/papers/gpt-4.pdf, 2023.
- Leveraging passage retrieval with generative models for open domain question answering, 2021a.
- Retrieval-augmented generation for large language models: A survey, 2024.
- Learning to filter context for retrieval-augmented generation, 2023.
- Grounding language model with chunking-free in-context retrieval, 2024.
- Distilling knowledge from reader to retriever for question answering. In International Conference on Learning Representations, 2021b. URL https://openreview.net/forum?id=NTEz-6wysdb.
- Retrieval meets long context large language models. In The Twelfth International Conference on Learning Representations, 2023b.
- Active retrieval augmented generation. arXiv preprint arXiv:2305.06983, 2023b. URL https://arxiv.org/pdf/2305.06983.
- Bge landmark embedding: A chunking-free embedding method for retrieval augmented long-context large language models, 2024.
- Parallel Context Windows Improve In-Context Learning of Large Language Models. arXiv, 2022. doi: 10.48550/arxiv.2212.10947. Window.
- Webgpt: Browser-assisted question-answering with human feedback, 2022.
- Auto-gpt for online decision making: Benchmarks and additional opinions. arXiv preprint arXiv:2306.02224, 2023.
- Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
- A survey on in-context learning. arXiv preprint arXiv:2301.00234, 2022.
- Chain of thought prompting elicits reasoning in large language models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=_VjQlMeSB_J.
- Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
- Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation, 2023b.