TarMAC: Targeted Multi-Agent Communication (1810.11187v2)

Published 26 Oct 2018 in cs.LG, cs.AI, cs.MA, and stat.ML

Abstract: We propose a targeted communication architecture for multi-agent reinforcement learning, where agents learn both what messages to send and whom to address them to while performing cooperative tasks in partially-observable environments. This targeting behavior is learnt solely from downstream task-specific reward without any communication supervision. We additionally augment this with a multi-round communication approach where agents coordinate via multiple rounds of communication before taking actions in the environment. We evaluate our approach on a diverse set of cooperative multi-agent tasks, of varying difficulties, with varying number of agents, in a variety of environments ranging from 2D grid layouts of shapes and simulated traffic junctions to 3D indoor environments, and demonstrate the benefits of targeted and multi-round communication. Moreover, we show that the targeted communication strategies learned by agents are interpretable and intuitive. Finally, we show that our architecture can be easily extended to mixed and competitive environments, leading to improved performance and sample complexity over recent state-of-the-art approaches.

PDF Abstract

TarMAC: Targeted Multi-Agent Communication in Reinforcement Learning

The paper "TarMAC: Targeted Multi-Agent Communication" presents a novel architecture designed to enhance communication efficacy among agents in multi-agent reinforcement learning (MARL) scenarios, especially within partially observable environments. This architecture, termed TarMAC, is characterized by its capability to learn relevant communication strategies among agents autonomously - discerning both the content of messages and their intended recipients without supervisory guidance.

Key Contributions

TarMAC's contribution to the field primarily revolves around its innovative use of targeted communication and multi-round interactions among agents. The authors propose a soft attention mechanism through which agents can effectively direct messages to specific peers, thereby mimicking human-like targeted interactions rather than broadcasting uniformly. This attention-based mechanism allows agents to dynamically discern which information is relevant based on the sender’s contextual understanding and the receiver’s current state.

Furthermore, the paper introduces a multi-round communication protocol where agents engage in iterative exchanges before committing to an action. This form of in-depth communication is designed to allow agents to synthesize extensive data before acting, thereby enhancing the collaborative efforts in complex tasks where single-pass communication might be inadequate.

Experimental Validation

The effectiveness of the TarMAC model was rigorously tested across various environments, ranging from structured grid layouts to intricate simulated traffic systems and navigation tasks in 3D environments like House3D. The results demonstrated in these settings underscore the benefits of the TarMAC architecture:

Improved Performance: The numerical results illustrate significant improvements in task accomplishment when employing targeted communication as opposed to non-targeted or no communication baselines.
Adaptability and Scalability: The architecture proves to be adaptable to varying team sizes and task complexities, demonstrating its scalability in large agent populations.
Interpretability: The learned attention mechanisms provide an interpretable framework, allowing insights into which agents communicate with whom and the content of these exchanges, a feature notably advantageous for diagnostic purposes.

Implications and Future Directions

The implications of targeted communication in MARL are profound. By enabling more nuanced, targeted exchanges, TarMAC opens new vistas in designing more efficient cooperative strategies, particularly in scenarios demanding precise coordination like autonomous vehicle platooning or distributed sensor networks.

However, while TarMAC showcases the potential of soft attention-based communication, the paper also points to several future directions. The adaptation of such architectures to competitive scenarios is briefly explored, and the potential integration with ongoing advances in planning networks and spatial memory systems offers intriguing avenues. Another possible extension is to transition from continuous vectors to a discrete symbol-based communication protocol, potentially bridging the gap between machine-to-machine and human-machine interaction protocols.

In conclusion, TarMAC represents a significant stride in the MARL landscape, offering a sophisticated method for targeted communication that enhances cooperation among agents through efficient learning mechanisms. The findings suggest promising future extensions and applications across diverse, complex domains requiring robust collaborative intelligence.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Abhishek Das (61 papers)
Joshua Romoff (17 papers)
Dhruv Batra (160 papers)
Devi Parikh (129 papers)
Michael Rabbat (64 papers)
Joelle Pineau (123 papers)
Théophile Gervet (3 papers)

Citations (347)

View on Semantic Scholar

TarMAC: Targeted Multi-Agent Communication (1810.11187v2)

TarMAC: Targeted Multi-Agent Communication in Reinforcement Learning

Key Contributions

Experimental Validation

Implications and Future Directions

Related Papers