Papers
Topics
Authors
Recent
2000 character limit reached

Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization (2510.22477v1)

Published 26 Oct 2025 in cs.MA and cs.AI

Abstract: To combat the prohibitive communication costs of free-for-all" multi-agent systems (MAS), we introduce \textbf{Agent-GSPO}, a framework that directly optimizes for token economy using sequence-level reinforcement learning. Agent-GSPO leverages the stable and memory-efficient Group Sequence Policy Optimization (GSPO) algorithm to train agents on a communication-aware reward that explicitly penalizes verbosity. Across seven reasoning benchmarks, Agent-GSPO not only achieves new state-of-the-art performance but does so with a fraction of the token consumption of existing methods. By fostering emergent strategies likestrategic silence," our approach provides a practical blueprint for developing scalable and economically viable multi-agent systems.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.