Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use
Abstract: In this paper, we demonstrate that an inherent waveform pattern in the attention allocation of LLMs significantly affects their performance in tasks demanding a high degree of context awareness, such as utilizing LLMs for tool-use. Specifically, the crucial information in the context will be potentially overlooked by model when it is positioned in the trough zone of the attention waveform, leading to decreased performance. To address this issue, we propose a novel inference method named Attention Buckets. It allows LLMs to process their input through multiple parallel processes. Each process utilizes a distinct base angle for the rotary position embedding, thereby creating a unique attention waveform. By compensating an attention trough of a particular process with an attention peak of another process, our approach enhances LLM's awareness to various contextual positions, thus mitigating the risk of overlooking crucial information. In the largest tool-use benchmark, our method elevates a 7B model to achieve state-of-the-art performance, comparable to that of GPT-4. On other benchmarks and some RAG tasks, which also demand a thorough understanding of contextual content, Attention Buckets also exhibited notable enhancements in performance.
- Language models are few-shot learners, 2020.
- OpenAI. OpenAI: Introducing ChatGPT, 2022.
- OpenAI. Gpt-4 technical report, 2023.
- Toolllm: Facilitating large language models to master 16000+ real-world apis, 2023.
- Toolformer: Language models can teach themselves to use tools, 2023.
- Vipergpt: Visual inference via python execution for reasoning, 2023.
- Gorilla: Large language model connected with massive apis, 2023.
- Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face, 2023.
- Lost in the middle: How language models use long contexts, 2023.
- Llama 2: Open foundation and fine-tuned chat models, 2023.
- Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Roformer: Enhanced transformer with rotary position embedding, 2022.
- Llama: Open and efficient foundation language models, 2023.
- Qwen technical report, 2023.
- Baichuan 2: Open large-scale language models, 2023.
- Extending context window of large language models via positional interpolation, 2023.
- Empower your model with longer and better context comprehension, 2023.
- Scaling laws of rope-based extrapolation, 2023.
- Getmusic: Generating any music tracks with a unified representation and diffusion framework, 2023.
- Pose: Efficient context window extension of llms via positional skip-wise training, 2023.
- Effective long-context scaling of foundation models, 2023.
- Are we falling in a middle-intelligence trap? an analysis and mitigation of the reversal curse, 2023.
- How long can open-source llms truly promise on context length?, June 2023.
- A Universally Unique IDentifier (UUID) URN Namespace. RFC 4122, jul 2005.
- React: Synergizing reasoning and acting in language models, 2023.
- Reflexion: Language agents with verbal reinforcement learning, june 2023. arXiv preprint arXiv:2303.11366, 2023.
- Envisioning future from the past: Hierarchical duality learning for multi-turn dialogue generation. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7382–7394, Toronto, Canada, July 2023. Association for Computational Linguistics.
- DialoGPS: Dialogue path sampling in continuous semantic space for data augmentation in multi-turn conversations. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1267–1280, Toronto, Canada, July 2023. Association for Computational Linguistics.
- Target-side input augmentation for sequence to sequence generation. In International Conference on Learning Representations, 2022.
- Reading wikipedia to answer open-domain questions. arXiv preprint arXiv:1704.00051, 2017.
- Re-creation of creations: A new paradigm for lyric-to-melody generation. arXiv e-prints, pages arXiv–2208, 2022.
- Relevance-guided supervision for openqa with colbert. Transactions of the association for computational linguistics, 9:929–944, 2021.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
- Rocketqa: An optimized training approach to dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2010.08191, 2020.
- Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48, 2020.
- Leveraging passage retrieval with generative models for open domain question answering, 2021.
- Generate rather than retrieve: Large language models are strong context generators. arXiv preprint arXiv:2209.10063, 2022.
- Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466, 2019.
- Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1533–1544, 2013.
- Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906, 2020.
- Augmented language models: a survey, 2023.
- Tool learning with foundation models, 2023.
- Toolkengpt: Augmenting frozen language models with massive tools via tool embeddings, 2023.
- Webarena: A realistic web environment for building autonomous agents, 2023.
- Self-consistency improves chain of thought reasoning in language models, 2023.
- Label words are anchors: An information flow perspective for understanding in-context learning, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.