Learning to Schedule Communication in Multi-agent Reinforcement Learning (1902.01554v1)

Published 5 Feb 2019 in cs.AI, cs.LG, and cs.MA

Abstract: Many real-world reinforcement learning tasks require multiple agents to make sequential decisions under the agents' interaction, where well-coordinated actions among the agents are crucial to achieve the target goal better at these tasks. One way to accelerate the coordination effect is to enable multiple agents to communicate with each other in a distributed manner and behave as a group. In this paper, we study a practical scenario when (i) the communication bandwidth is limited and (ii) the agents share the communication medium so that only a restricted number of agents are able to simultaneously use the medium, as in the state-of-the-art wireless networking standards. This calls for a certain form of communication scheduling. In that regard, we propose a multi-agent deep reinforcement learning framework, called SchedNet, in which agents learn how to schedule themselves, how to encode the messages, and how to select actions based on received messages. SchedNet is capable of deciding which agents should be entitled to broadcasting their (encoded) messages, by learning the importance of each agent's partially observed information. We evaluate SchedNet against multiple baselines under two different applications, namely, cooperative communication and navigation, and predator-prey. Our experiments show a non-negligible performance gap between SchedNet and other mechanisms such as the ones without communication and with vanilla scheduling methods, e.g., round robin, ranging from 32% to 43%.

PDF Abstract

Learning to Schedule Communication in Multi-agent Reinforcement Learning

The paper "Learning to Schedule Communication in Multi-agent Reinforcement Learning" addresses the problem of effective communication in multi-agent reinforcement learning (MARL) environments, particularly in scenarios where communication bandwidth is limited and medium access control (MAC) is necessary due to shared communication channels. The authors propose a novel framework named SchedNet that facilitates intelligent scheduling of communications among agents, which is especially pertinent given the practical constraints found in wireless networking environments.

Summary of Contributions

SchedNet combines the principles of centralized training and distributed execution (CTDE) to enhance the cooperative behavior of agents through effectively scheduled communication. The framework introduces a three-component architecture comprising:

Actor Network: Composed of per-agent modules that manage message encoding, action selection, and weight generation for scheduling prioritization.
Scheduler: Determines which agents can broadcast messages based on the calculated importance of their information, facilitating efficient utilization of limited communication resources.
Critic Network: Assists during centralized training by offering feedback that considers the global state information.

Methodology

SchedNet operates under the recognition that real-world MARL scenarios often involve limited bandwidth and shared mediums, akin to wireless communication systems. Thus, the system is designed to select a subset of agents for communication using techniques aligned with existing MAC protocols such as CSMA (Carrier Sense Multiple Access), facilitating distributed scheduling.

The scheduling mechanism leverages a weight-based approach where:

Top(k) prioritizes agents with higher weight values.
Softmax(k) applies a probabilistic selection based on a softmax transformation of the weights.

These methods are intended to approximate practical wireless scheduling protocols while maintaining computational efficiency.

Experimental Validation

The authors evaluated SchedNet in two environments: Predator-Prey (PP) and Cooperative Communication and Navigation (CCN). The experiments underscored the efficacy of SchedNet in achieving higher performance relative to other baselines like IDQN and COMA, which do not consider communication.

In the PP environment, SchedNet -Top(1) exhibited a 43% performance improvement over Round Robin scheduling.
The CCN environment demonstrated similar results, highlighting the importance of intelligent scheduling in MARL settings with 32% improvement.

The paper further illustrated how SchedNet's learning capabilities allow it to prioritize agents with more crucial observations automatically, adapting its scheduling strategy to maximize the collective reward in coordination tasks.

Implications and Future Directions

SchedNet lays the groundwork for enhancing coordination in MARL tasks under communication constraints, making it especially relevant for distributed systems operating in constrained networking environments. As multi-agent systems become increasingly prevalent in mobile and IoT applications, frameworks like SchedNet will be pivotal in ensuring robust and efficient inter-agent communication.

The research opens avenues for further exploration into more complex and dynamic scheduling scenarios, potentially involving varied network conditions or adapting to non-static agent abilities. Future work could also integrate recurrent neural networks (RNNs) within the SchedNet architecture to further tackle scenarios characterized by highly partial observability or non-stationary dynamics.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Daewoo Kim (6 papers)
Sangwoo Moon (10 papers)
David Hostallero (1 paper)
Wan Ju Kang (5 papers)
Taeyoung Lee (59 papers)
Kyunghwan Son (8 papers)
Yung Yi (30 papers)

Citations (189)

View on Semantic Scholar

Learning to Schedule Communication in Multi-agent Reinforcement Learning (1902.01554v1)