Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use (2312.04455v4)

Published 7 Dec 2023 in cs.CL, cs.AI, and cs.LG

Abstract: In this paper, we demonstrate that an inherent waveform pattern in the attention allocation of LLMs significantly affects their performance in tasks demanding a high degree of context awareness, such as utilizing LLMs for tool-use. Specifically, the crucial information in the context will be potentially overlooked by model when it is positioned in the trough zone of the attention waveform, leading to decreased performance. To address this issue, we propose a novel inference method named Attention Buckets. It allows LLMs to process their input through multiple parallel processes. Each process utilizes a distinct base angle for the rotary position embedding, thereby creating a unique attention waveform. By compensating an attention trough of a particular process with an attention peak of another process, our approach enhances LLM's awareness to various contextual positions, thus mitigating the risk of overlooking crucial information. In the largest tool-use benchmark, our method elevates a 7B model to achieve state-of-the-art performance, comparable to that of GPT-4. On other benchmarks and some RAG tasks, which also demand a thorough understanding of contextual content, Attention Buckets also exhibited notable enhancements in performance.

PDF HTML Abstract

Understanding the Impact of Attention Allocation in LLMs

Context Awareness Challenge in LLMs

LLMs have become highly skilled as 'tool agents' capable of complex functionality. However, an often-overlooked aspect of their capabilities is the model's attention mechanism—specifically, how attention is allocated to different parts of the context data it processes. This paper examines how the model's attention pattern, which can exhibit a waveform, affects its performance when using tools. The crux of the issue is that essential information can be missed if it coincides with what this paper terms an 'attention trough' in the waveform pattern.

Attention Buckets: Parallel Processing Enhancement

To mitigate the risk of missing critical details in the context, researchers introduced a novel method called Attention Buckets. This technique leverages parallel processing, allowing an LLM to handle multiple context versions simultaneously, each with a different Rotary Position Embedding (RoPE) angle base. Doing so varies the attention pattern for each version. The pivotal idea is to compensate for the troughs in one model's attention wave with peaks from another run, thus covering all important pieces of information. The output of these parallel executions is then aggregated, combining the strengths of varied attention allocations for a comprehensive understanding and a more robust performance.

State-of-the-Art Benchmarks Achieved

The proposed method was rigorously tested on a recognized tool use benchmark and the results were notable. By enhancing a 7-billion-parameter open-source model with Attention Buckets, researchers achieved state-of-the-art performance, matching that of the much larger GPT-4 model. Furthermore, when employed with various reasoning methods, it showed improvements over baselines without Attention Buckets. This success points to a significant leap forward in the tool-use proficiency of LLMs and opens up exciting possibilities for research into the fundamental capabilities of AI.

Broader Implications for Retrieval-Augmented Generation Tasks

Given the enhanced context awareness provided by Attention Buckets, its effectiveness extends beyond tool use. It shows promise for open-domain question answering (ODQA) tasks, which also require high levels of contextual comprehension. Experiments conducted on popular ODQA benchmarks, with Attention Buckets augmenting Llama-2-7B, demonstrated superior performance to dedicated QA models. Additionally, the choice of RoPE bases, and the search algorithm employed to select them, proved effective, further suggesting a wide applicability to numerous tasks that depend on high-context utilization.

PDF Markdown Bookmark Chat (Pro)

References (46)

Authors (8)

Yuhan Chen (39 papers)
Ang Lv (19 papers)
Ting-En Lin (28 papers)
Changyu Chen (19 papers)
Yuchuan Wu (33 papers)
Fei Huang (408 papers)
Yongbin Li (128 papers)
Rui Yan (250 papers)

Citations (19)

View on Semantic Scholar

Related Papers

Find Related Papers

Reddit

Attention Buckets achieves SOTA performance on par with GPT-4 (6 points, 25 comments)
Attention Buckets achieves SOTA performance on par with GPT-4 (5 points, 1 comment)