ToM2C: Target-oriented Multi-agent Communication and Cooperation with Theory of Mind (2111.09189v2)

Published 15 Oct 2021 in cs.MA, cs.AI, and cs.LG

Abstract: Being able to predict the mental states of others is a key factor to effective social interaction. It is also crucial for distributed multi-agent systems, where agents are required to communicate and cooperate. In this paper, we introduce such an important social-cognitive skill, i.e. Theory of Mind (ToM), to build socially intelligent agents who are able to communicate and cooperate effectively to accomplish challenging tasks. With ToM, each agent is capable of inferring the mental states and intentions of others according to its (local) observation. Based on the inferred states, the agents decide "when" and with "whom" to share their intentions. With the information observed, inferred, and received, the agents decide their sub-goals and reach a consensus among the team. In the end, the low-level executors independently take primitive actions to accomplish the sub-goals. We demonstrate the idea in two typical target-oriented multi-agent tasks: cooperative navigation and multi-sensor target coverage. The experiments show that the proposed model not only outperforms the state-of-the-art methods on reward and communication efficiency, but also shows good generalization across different scales of the environment.

Authors (4)

Yuanfei Wang (6 papers)
Fangwei Zhong (27 papers)
Jing Xu (244 papers)
Yizhou Wang (162 papers)

Citations (62)

View on Semantic Scholar

Summary

Overview of "ToM2C: Target-oriented Multi-agent Communication and Cooperation with Theory of Mind"

The paper "ToM2C: Target-oriented Multi-agent Communication and Cooperation with Theory of Mind" introduces a novel approach to enhance target-oriented multi-agent systems through the incorporation of Theory of Mind (ToM). The authors propose a framework called ToM2C, which aims to improve the performance of agents in collaborative scenarios by enabling them to infer the mental states and intentions of their peers. This inference-based mechanism allows agents to decide strategically on communication partners and timing, ultimately leading to efficient goal achievement in challenging tasks.

Key Components and Methodology

The ToM2C framework is structured around four key components: Observation Encoder, Theory of Mind Network (ToM Net), Message Sender, and Decision Maker. Each agent in this framework operates with a hierarchical approach, where high-level policies decide on sub-goals and low-level executors perform actions to achieve these sub-goals.

Observation Encoder: Utilizes attention mechanisms to encode local observations, ensuring scalability and effective processing of multi-target information.
Theory of Mind Network (ToM Net): Empowers agents to estimate the observations and intentions of others based on local input and global agent positioning. The ToM Net uses GRUs for temporal estimation and is reinforced through auxiliary tasks for observation prediction.
Message Sender: Employs Graph Neural Networks to establish communication links dynamically, driven by inferred intentions and necessity, thereby reducing communication overhead through strategic message sending.
Decision Maker: Integrates local, inferred, and communicated information to finalize action plans, thereby facilitating decentralized decision-making pertinent to the collective goals of the agents.

Experimental Results

Experiments conducted in cooperative navigation and multi-sensor multi-target coverage environments demonstrate that ToM2C outperforms several state-of-the-art multi-agent reinforcement learning (MARL) methods, including TarMAC, HiT-MAC, and MAPPO. The paper outlines enhancements in both reward metrics and communication efficiency, highlighting the robustness of ToM2C through comprehensive ablation studies and scalability tests across varied sensor-target configurations.

Quantitative evaluations indicate significant improvements in target coverage rates, and further analysis reveals the minimal communication bandwidth in comparison to alternative methods. In scenarios where hierarchical approaches typically demand greater communication complexity, ToM2C proves superior due to its cognitive inference-driven communication reduction technique.

Implications and Future Outlook

From a theoretical perspective, this research underscores the impact of intelligently inferring agent mental states to optimize cooperative behaviors in multi-agent systems. Practically, ToM2C's communication methodology presents an avenue for deploying MARL in bandwidth-constrained environments, relevant to numerous applications such as distributed sensor networks and autonomous navigation systems.

The authors suggest future avenues include extending ToM2C to non-target-oriented settings, automatic goal generation, and refining the communication reduction method to address label uncertainty in its supervised learning process. The paper also emphasizes the need to explore in-depth evaluations in diverse real-world scenarios to further validate the efficacy and applicability of ToM-based strategies in multi-agent cooperation.

PDF Markdown

Related Papers

GitHub

GitHub - UnrealTracking/ToM2C: The offcial implementation of "ToM2C: Target-oriented Multi-agent Communication and Cooperation with Theory of Mind" (ICLR 2022) . (65 stars)