Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System

Published 10 Oct 2024 in cs.CL and cs.AI | (2410.08115v2)

Abstract: LLM based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving, yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods. We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness in LLM-based MAS through LLM training. Optima employs an iterative generate, rank, select, and train paradigm with a reward function balancing task performance, token efficiency, and communication readability. We explore various RL algorithms, including Supervised Fine-Tuning, Direct Preference Optimization, and their hybrid approaches, providing insights into their effectiveness-efficiency trade-offs. We integrate Monte Carlo Tree Search-inspired techniques for DPO data generation, treating conversation turns as tree nodes to explore diverse interaction paths. Evaluated on common multi-agent tasks, including information-asymmetric question answering and complex reasoning, Optima shows consistent and substantial improvements over single-agent baselines and vanilla MAS based on Llama 3 8B, achieving up to 2.8x performance gain with less than 10\% tokens on tasks requiring heavy information exchange. Moreover, Optima's efficiency gains open new possibilities for leveraging inference-compute more effectively, leading to improved inference-time scaling laws. By addressing fundamental challenges in LLM-based MAS, Optima shows the potential towards scalable, efficient, and effective MAS (https://chenweize1998.github.io/optima-project-page).

Abstract PDF HTML Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper presents Optima to enhance multi-agent communication and task performance through an innovative iterative training paradigm.
It employs a generate, rank, select, and train approach with a balanced reward function, achieving up to 2.8x performance improvement and reduced token usage.
Experimental results highlight improved scalability and efficiency, paving the way for more effective LLM-based multi-agent systems in resource-constrained environments.

Overview of Optima: Optimizing Multi-Agent Systems with LLMs

The paper presents Optima, a comprehensive framework designed to enhance the effectiveness and efficiency of LLM-based multi-agent systems (MAS). Challenges such as low communication efficiency, poor scalability, and inadequate parameter-updating methods are addressed through an innovative iterative training paradigm. Optima integrates several reinforcement learning (RL) techniques, including Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO), to optimize inter-agent communication and task performance.

Methodology

Optima employs a generate, rank, select, and train process that iterates to progressively improve agent behavior. A reward function balancing task performance, token efficiency, and communication readability guides the optimization:

Reward Function: Balances task-specific performance, token count normalization, and LLM loss.
Monte Carlo Tree Search (MCTS): Utilized for simulating diverse interaction paths and improving data generation.
Framework Variants: Includes iterative SFT (iSFT), iterative DPO (iDPO), and a hybrid approach combining both (iSFT-DPO).

Results

Optima achieves notable improvements over established baselines across various tasks, particularly in multi-agent settings that involve information exchange and reasoning:

Performance Gains: Demonstrated up to a 2.8x improvement in task performance with reduced token usage.
Token Efficiency: A consistent reduction in required inference tokens was observed, enhancing computational efficiency.
Inferential Scaling: Optima's efficiency supports better scaling laws, allowing for a more effective use of inference-time compute.

Implications and Future Directions

The implications of this research are twofold, addressing both theoretical advancements and practical applications:

Scalability and Efficiency: The efficiency gains suggest potential for scaling LLM-based MAS in real-world applications where resource constraints are critical.
Inference Scaling Laws: By reducing token requirements, Optima paves the way for more advanced inference techniques, such as self-consistency with optimized sampling.

Conclusion

Optima establishes a foundation for scalable, efficient, and effective MAS by addressing fundamental challenges in LLM-based systems. Future research can explore leveraging Optima's principles in larger models and more diverse multi-agent configurations, potentially leading to further breakthroughs in AI collaboration and communication.

Markdown