Multi-Agent Collaboration Enhances LLMs for Long-Text Processing
Introduction
LLMs have made significant strides in natural language understanding and problem-solving. However, their ability to process long texts remains a considerable challenge, largely due to computational constraints and declining attention performance over extended sequences. This paper introduces LONG AGENT, a novel approach employing multi-agent collaboration to scale LLMs, enabling effective handling of documents exceeding 100,000 tokens.
Long Text Handling in LLMs
Traditionally, strategies to extend LLMs' context windows have revolved around improving positional encoding and designing mechanisms to manage longer inputs without a significant loss in long-term dependency tracking. Despite these advancements, models still struggle with processing large texts efficiently, a limitation LONG AGENT seeks to address through a collaborative agent-based framework.
LONG AGENT Architecture
LONG AGENT comprises a leader and multiple member agents, each responsible for analyzing segments of an input text and contributing to a collective understanding. The leader, understanding user intent, orchestrates the discussion among members to consolidate information and deduce answers to complex queries. This structure introduces an inter-member communication mechanism to resolve conflicting information, thus addressing the issue of hallucinations commonly faced by individual models when interpreting extensive data.
Implementation and Evaluation
The paper assesses LONG AGENT using both existing benchmarks like Needle in a Haystack PLUS and synthetic tasks designed to test models' long-text processing capabilities. Experimental results indicate that LONG AGENT, leveraging a 7B-parameter LLaMA for instantiation, outperforms established models including GPT-4 in tasks requiring comprehension over lengthy texts. This superiority is attributed to the ability to process segments in parallel, thus simplifying the task for each agent and allowing for efficient handling of larger contexts without increasing computational demands.
Efficiency and Scalability
A critical examination of LONG AGENT reveals its linear growth in inference time relative to text length, distinguishing it from full-attention mechanisms exhibiting quadratic complexity. This scalability and reduced memory footprint present a significant advantage in practical applications, where processing extensive documents is necessary.
Conclusion and Future Work
LONG AGENT marks a pivotal step towards leveraging multi-agent systems in enhancing the performance of LLMs in long-text processing. By distributing the cognitive load across multiple agents and utilizing a leader to synthesize their insights, LONG AGENT demonstrates notable improvements over traditional models in handling extensive narratives. Future research directions include optimizing the leader's decision-making process and expanding the architecture to encompass a wider variety of tasks, potentially extending LLM applications in areas previously constrained by processing capabilities.