Multiagent Cooperation and Competition with Deep Reinforcement Learning (1511.08779v1)

Published 27 Nov 2015 in cs.AI, cs.LG, and q-bio.NC

Abstract: Multiagent systems appear in most social, economical, and political situations. In the present work we extend the Deep Q-Learning Network architecture proposed by Google DeepMind to multiagent environments and investigate how two agents controlled by independent Deep Q-Networks interact in the classic videogame Pong. By manipulating the classical rewarding scheme of Pong we demonstrate how competitive and collaborative behaviors emerge. Competitive agents learn to play and score efficiently. Agents trained under collaborative rewarding schemes find an optimal strategy to keep the ball in the game as long as possible. We also describe the progression from competitive to collaborative behavior. The present work demonstrates that Deep Q-Networks can become a practical tool for studying the decentralized learning of multiagent systems living in highly complex environments.

Citations (807)

View on Semantic Scholar

Summary

The paper introduces a multiagent extension of Deep Q-Learning using independent DQNs in Pong to explore cooperative and competitive dynamics.
It employs diverse reward schemes—fully competitive, fully cooperative, and transitional—to elicit distinct strategic behaviors and measure performance improvements.
The study demonstrates the scalability of decentralized learning and sets the stage for future research in complex, multiagent environments.

Multiagent Cooperation and Competition with Deep Reinforcement Learning

The paper "Multiagent Cooperation and Competition with Deep Reinforcement Learning" explores the dynamics of multiagent systems through the lens of reinforcement learning. Specifically, the research extends the Deep Q-Learning Network (DQN) architecture to multiagent environments and analyzes interaction behaviors in the classic Atari game Pong. The researchers employ various reward schemes to induce competitive and collaborative behaviors, providing a framework for studying the decentralized learning in multiagent systems operating in complex environments.

Overview of the Approach

Leveraging the DQN architecture, the paper captures the interplay between two agents controlled by independent DQNs. The input to each agent consists exclusively of raw screen images and the reward signal, without access to high-level abstractions or explicit game dynamics. This methodology is inspired by prior work from Google DeepMind, which demonstrated the capability of DQNs to achieve superhuman performance in single-agent Atari games.

Key Methodologies

Deep Q-Learning Extension

The goal is to find a policy that maximizes the accumulated long-term reward without explicit information on the environment's dynamics or rewards. Each agent's DQN independently approximates Q-value functions, leveraging convolutional neural networks for feature representation.

Adaptation for Multiplayer

To facilitate multiplayer functionality, the original codebase was extended to support two-player interactions. The game state remains fully observable to both agents, ensuring symmetry in data access and game dynamics.

Rewarding Schemes

Three primary reward structures were investigated:

Fully Competitive: This zero-sum game mode rewards one agent positively while penalizing the other upon scoring. This setup aims to elicit competitive behavior.
Fully Cooperative: Both agents receive negative rewards when the ball goes out of play, incentivizing them to keep the ball in the game for as long as possible.
Transitional: Intermediate states between competition and cooperation were explored by varying the reward for scoring while maintaining the penalty for losing the ball.

Results

Competitive Behavior

Agents under the competitive reward scheme evolved to efficiently score against their opponent. Quantitative metrics showed a progressive increase in paddle-bounce counts per point, reflecting improved gameplay skills. Serving times decreased as agents learned to rapidly re-initiate gameplay, motivated by the positive Q-value predictions upon serving. Notably, Q-value estimations for competitive settings were overly optimistic, a recognized limitation of Q-learning that could be mitigated in future work using approaches like Double Q-learning.

Cooperative Behavior

In the cooperative setting, agents developed strategies to maximize the duration of gameplay. A significant behavior was the tendency to pass the ball horizontally, minimizing wall bounces and prolonging game exchanges. Serving times increased as agents delayed the initiation of potentially negative-reward gameplay. The collaborative strategy sometimes led to coordinated actions where agents would position themselves optimally to keep the ball in play indefinitely.

Transition Between Behaviors

The transition experiments revealed that even intermediate rewards could significantly influence behavior, showing clear shifts towards cooperative or competitive strategies. Q-values and serving times were sensitive to the reward scheme, demonstrating the adaptability of the DQN framework in handling varied multiagent interactions.

Implications and Future Work

This paper provides a robust demonstration of using DQNs for decentralized multiagent reinforcement learning in complex environments. The findings have practical implications for developing systems where autonomous agents must navigate competitive and collaborative interactions without central oversight. Theoretically, this work extends the understanding of emergent behaviors in multiagent systems when driven by varied reward structures.

Future research could focus on scaling this approach to more agents and more complex environments, exploring the emergence of communication and consensus among numerous agents. Other avenues include adopting more sophisticated multiagent reinforcement learning algorithms and analyzing their scalability and efficiency in high-dimensional settings.

In summary, the paper showcases the versatility and potential of DQNs in studying multiagent systems, paving the way for future research in decentralized learning and emergent behavior analysis.

PDF Markdown

Related Papers

YouTube

Show All Videos