Learning to Simulate Self-Driven Particles System with Coordinated Policy Optimization (2110.13827v2)

Published 26 Oct 2021 in cs.LG

Abstract: Self-Driven Particles (SDP) describe a category of multi-agent systems common in everyday life, such as flocking birds and traffic flows. In a SDP system, each agent pursues its own goal and constantly changes its cooperative or competitive behaviors with its nearby agents. Manually designing the controllers for such SDP system is time-consuming, while the resulting emergent behaviors are often not realistic nor generalizable. Thus the realistic simulation of SDP systems remains challenging. Reinforcement learning provides an appealing alternative for automating the development of the controller for SDP. However, previous multi-agent reinforcement learning (MARL) methods define the agents to be teammates or enemies before hand, which fail to capture the essence of SDP where the role of each agent varies to be cooperative or competitive even within one episode. To simulate SDP with MARL, a key challenge is to coordinate agents' behaviors while still maximizing individual objectives. Taking traffic simulation as the testing bed, in this work we develop a novel MARL method called Coordinated Policy Optimization (CoPO), which incorporates social psychology principle to learn neural controller for SDP. Experiments show that the proposed method can achieve superior performance compared to MARL baselines in various metrics. Noticeably the trained vehicles exhibit complex and diverse social behaviors that improve performance and safety of the population as a whole. Demo video and source code are available at: https://decisionforce.github.io/CoPO/

View on arXiv

Authors (5)

Zhenghao Peng (27 papers)
Quanyi Li (19 papers)
Ka Ming Hui (1 paper)
Chunxiao Liu (53 papers)
Bolei Zhou (134 papers)

Citations (55)

View on Semantic Scholar

Summary

Coordinated Policy Optimization for Multi-Agent Systems

The paper "Learning to Simulate Self-Driven Particles System with Coordinated Policy Optimization" presents a method for simulating Self-Driven Particle (SDP) systems, which are representative of several real-world scenarios including traffic flows and flocking birds. The authors introduce a reinforcement learning approach, termed Coordinated Policy Optimization (CoPO), aimed at addressing the complexities inherent in SDP multi-agent systems where individual agents dynamically alternate between cooperative and competitive behaviors.

Overview of Self-Driven Particles

SDP systems are characterized by individual agents pursuing distinct objectives while engaging in interactions that exhibit complex collective behaviors. Traditional rule-based or hydrodynamic models have been effective in unconstrained environments; however, challenges arise in more structured, non-stationary conditions such as specific traffic scenes. Manual controllers and existing multi-agent reinforcement learning (MARL) techniques fall short here, as they often require predefined roles, failing to capture the dynamic nature of SDP interactions.

Introduction of Coordinated Policy Optimization

CoPO emerges as a novel MARL method incorporating principles from social psychology, specifically addressing the coordination of agent behaviors in SDP systems. The method is particularly tested within traffic simulation environments, where agents—vehicles, in this case—must navigate and interact within complex road networks. CoPO incorporates two levels of coordination: local and global.

Local Coordination: Inspired by social value orientation measures, CoPO employs a mechanism that factors in the neighborhood rewards weighted by a Local Coordination Factor (LCF). This factor captures each agent's inclination toward selfish, cooperative, or competitive behaviors, incorporating neighborhood interactions into the learning process.
Global Coordination: CoPO employs a meta-learning strategy to optimize the distribution of LCFs for the population, ultimately enhancing system-wide performance. This component of CoPO enforces global coordination by aligning the local coordinated behaviors with the overarching objectives of minimizing collisions and maximizing task success.

Experimental Validation and Results

The empirical evaluation is conducted using a developed set of traffic environments, each presenting unique structural challenges. Key metrics assessed include success rate, efficiency, and safety. CoPO demonstrates significantly superior performance compared to PPO-based independent policy optimization and alternative MARL approaches in these measures, particularly excelling in environments requiring negotiation and yielding behaviors such as intersections and tollgates.

Implications and Future Directions

The proposed CoPO method represents a valuable tool for advancing realistic simulations in multi-agent systems, with particular applicability in intelligent transport systems and pedestrian modeling. Its success in generating socially compliant and diverse interaction behaviors in traffic scenarios underscores the utility of integrating social psychological principles into MARL frameworks.

Potential future research avenues include expanding CoPO's applicability beyond traffic environments to broader categories of SDP systems, improving generalization capabilities under varied traffic densities, and further exploring the intersection of MARL with human-compliant imitation learning strategies. Additionally, enhancing the realism of perception modules within the environments, such as integrating comprehensive sensor input akin to real-world conditions, remains a vital consideration for continued advancements in simulation fidelity.

Related Papers

Find Related Papers

YouTube

Show All Videos