- The paper introduces ThoughtComm, a method that enables direct exchange of latent thoughts among agents, overcoming the limitations of token-based communication.
- It employs a sparsity-regularized autoencoder to extract and structure latent thoughts, integrating them as prefixes for enhanced response generation.
- Experiments on math reasoning benchmarks demonstrate improved accuracy and consensus among agents, validating the framework’s scalability and efficiency.
"Thought Communication in Multiagent Collaboration" Summary
Introduction
This paper challenges traditional communication paradigms in multi-agent systems (MAS), which primarily rely on natural language communication through tokens or embeddings exchanged among LLMs. The established norm of using natural language presents limitations: sequential nature, ambiguity, and indirect expression of latent thoughts. The authors propose "thought communication," a novel paradigm where agents share latent thoughts directly, bypassing the constraints inherent in language-based exchange. This innovation is rooted in a formalization of inter-agent communication as a latent variable model, where agent states stem from an unknown function of underlying thoughts. The authors lay out theoretical guarantees for recovering shared and private latent thoughts and their structures among agents.
ThoughtComm Framework
Latent Thought Extraction:
The ThoughtComm framework starts by encoding agents' model states into a common latent space using a sparsity-regularized autoencoder. This process reveals latent thoughts from the agents' model states (a concatenated form of these states per communication round). The autoencoder enforces sparsity, reflecting the minimal shared cognitive basis among agents and enabling the extraction of latent thoughts with identified structures.
Structural Information Integration:
Latent thoughts, once extracted, are redistributed among agents based on structured dependency. This involves:
- Identifying relevant thoughts for each agent from the latent space,
- Grouping thoughts by how widely they apply across agent interactions (shared vs. private),
- Reweighting these thoughts to reflect their consensus among agents before reintroducing them into their models as prefixes in subsequent communication rounds.
Prefix Adaptation:
Reintegration into agent operations is handled through “prefix adaptation.” An adapter transforms selected latent thoughts into a prefix vector, which is prepended to an agent’s input context, guiding the generation of the agent's subsequent responses. This approach supports continuous latent guidance in multi-agent reasoning, extending beyond fixed natural language exchanges.
Theoretical Underpinnings
The authors provide clear identifiability results ensuring that the recovered latent thoughts accurately reflect the underlying cognitive processes of agents:
- Shared Thoughts: The framework guarantees that thoughts shared between any two agents are recoverable, disentangled from private thoughts.
- Private Thoughts: Theories also support recovering private thoughts, which are uniquely held by individual agents.
- Thought Structure: The structure linking latent thoughts to specific agents, indicating which agents share which thoughts, is identifiable up to permutation.
These identifiability results are grounded in classical theory but extend into the nonlinear, high-dimensional interaction spaces typical of LLMs.
Experiments
Experiments spanning synthetic and real-world benchmarks validate the advantages thought communication offers over previous approaches. Tests on math reasoning datasets (e.g., MATH and GSM8K) demonstrate significant improvements in accuracy and consensus among agents, across diverse LLM models and sizes. These experiments establish that ThoughtComm not only increases collaborative effectiveness but also scales efficiently even with growing numbers of agents or debate rounds.
Conclusion
By formalizing thought as a direct communication medium among agents, ThoughtComm overcomes the inherent limitations of language-based MAS systems. Though practical limitations around accessing model states exist, the paper proposes embedding alternatives to extend applicability. Future work could explore non-textual modalities using this paradigm, implying vast potential in areas beyond text and natural language, significantly enriching multi-agent collaborative capabilities. Overall, the paper provides a strong theoretical and practical foundation for the proposed communication paradigm, offering a scalable, efficient, and comprehensive solution for enhancing collective intelligence in MAS.