SocialGym 2.0: Multi-Agent Social Navigation
- SocialGym 2.0 is a modular simulation platform that supports multi-agent navigation in human-centered environments using a MARL framework.
- It integrates with PettingZoo and Stable Baselines3, offering flexible scenario generation, customizable rewards, and efficient ROS communication.
- The platform benchmarks state-of-the-art RL techniques with advanced evaluation metrics to ensure social compliance and collision-free navigation.
SocialGym 2.0 is an open-source, modular simulation platform designed for the development, training, and benchmarking of multi-agent social navigation policies in environments replicating real-world human spaces. It focuses on both autonomous robot interaction and social compliance in shared, dynamic settings. Notable for its flexible scenario generation, reward specification, and multi-agent reinforcement learning (MARL) framework, SocialGym 2.0 is built to advance research in multi-robot navigation under complex, human-centered constraints (Sprague et al., 2023, Kapoor et al., 2023).
1. System Architecture and Software Design
SocialGym 2.0 is architected as a highly configurable extension of the PettingZoo multi-agent Gym interface, integrating seamlessly with the Stable Baselines3 (SB3) API for reinforcement learning workflows. Its core simulation logic is abstracted across the following main modules:
- Top-Level Interface: Extends PettingZoo’s reset/step API for -agent, partially observable stochastic games. Observer and Rewarder helper classes modularize observation and reward design, respectively.
- Learning Integration: Directly supports off-the-shelf MARL algorithms (e.g., PPO, A2C, DQN) from SB3 and SB3-Contrib. Offers LSTM architectures for partial observability.
- Communication and Execution: Uses ROS messaging to interface between the Python environment and a lightweight C++ engine (UTMRS). The environment sends high-level discrete actions over ROS topics to UTMRS, which updates the world state and publishes new observations. Local navigation and human simulation modules subscribe and return feasible, continuous motions under kinodynamic constraints.
- Scenario Generation: Users define 2D vector maps (as sets of line segments representing walls/obstacles), overlay navigation graphs to define discrete action spaces, and specify agent start/end nodes as global paths. Scenarios support curriculum learning via agent count and environment diversity.
- Installation and Execution: SocialGym 2.0 is containerized via Docker and manages ROS communications internally with minimal configuration requirements (Sprague et al., 2023).
2. Environment Modeling and Scenario Specification
Environments in SocialGym 2.0 are structured around vectorized 2D maps, navigation graphs, and flexible scenario scripts:
- 2D Vector Maps: Users provide lists of line segments for obstacles. The system includes mini-game templates—Open, Doorway, Hallway, Intersection, Roundabout—and supports importing realistic floorplans from SocialGym 1.0.
- Navigation Graphs: Nodes and edges define the discrete action space ("move to next node" or "stop"), support forced conflicts (e.g., shared narrow doorways), and overlay the map geometry to structure agent motion.
- Scenarios: Each scenario defines global start/end node pairs for each agent and supports customizable agent populations per episode, enabling both fixed and stochastic environment sampling.
This design enables rapid prototyping and diverse experimental setups, essential for benchmarking MARL algorithms and understanding emergent agent behavior under social constraints (Sprague et al., 2023, Kapoor et al., 2023).
3. Agent and Human Dynamics
SocialGym 2.0 models both robot and human motion with fidelity to real-world operational constraints:
- Agent Kinematics: Agents default to a unicycle-style differential drive model:
with configurable limits on , , , and per robot.
- Local Navigation Module: Samples kinodynamically feasible, collision-free trajectories toward intermediate navigation nodes, updating state using either differential-drive or omni-drive equations.
- Human Crowd Modeling: When enabled, human agents are simulated using the Social Forces Model [Helbing & Molnar 1995]:
with terms for goal attraction, pedestrian repulsion, and static obstacle avoidance. This captures group formation, dispersal, and other complex social behaviors.
The result is an ecosystem where both robots and humans are subject to physically and socially plausible motion constraints, supporting MARL research at the intersection of navigation function and compliance (Sprague et al., 2023, Kapoor et al., 2023).
4. Observations, Action Spaces, and Reward Design
Observations, actions, and reward functions in SocialGym 2.0 are fully modular and customizable:
- Observation Space :
- Agent’s self-state features: distance to goal , position , velocity , heading , speed
- Relative features for other agents (with optional observation angle cutoff )
- Collision and success flags
- Optional LSTM embeddings for reasoning about variable neighboring agents
- Action Space :
- Discrete mode: graph node selection and STOP action
- Continuous mode (in development): direct velocity commands bypassing the navigation graph
- Reward Function:
Each term (goal, progress, collision, per-step penalty, entropy) can be weighted and scheduled by the user. The Rewarder enables plug-and-play user-defined reward components. For social compliance, both heuristic and data-driven (e.g. SNGNN (Kapoor et al., 2023)) reward signals are supported.
This schema enables benchmarking both classical hand-crafted policies and state-of-the-art deep RL or GNN-based agents under unified metrics (Sprague et al., 2023, Kapoor et al., 2023).
5. Supported Algorithms and Baseline Benchmarks
SocialGym 2.0 natively supports major RL and MARL algorithms via Stable Baselines3:
- PPO (default), A2C, DQN, DDPG, SAC, and LSTM-PPO
- Benchmarked baselines include CADRL and CADRL-LSTM with unicycle kinematics, "Enforced Order/Any Order" subgoal policies for conflict resolution, and the "Only Local" policy (local-only navigation).
Training adapts centralized training with decentralized execution, variable agent counts per episode, and direct exchange between high-level discrete and low-level continuous actions. LSTM-PPO shows strong generalization to higher, unseen agent populations (up to 10), and policies optimized with SNGNN-based reward demonstrate improved social-compliance metrics (Sprague et al., 2023, Kapoor et al., 2023).
6. Evaluation Metrics and Analysis
Performance in SocialGym 2.0 is quantified across success, efficiency, and social-compliance axes:
- Collected Metrics:
- Success rate (full and partial task completion)
- Collision rate per episode
- Average trajectory length (step count)
- Stop time (stationary duration per agent)
- Peak (maximum agent speed changes as a proxy for awkwardness)
- In SocNavGym: Success × Path Length (SPL), Success × (1 – Time / MaxTime) (STL), personal-space compliance, and human-robot comfort metrics
Benchmarking reveals:
- No single algorithm is universally superior across all mini-game types (Doorway, Hallway, Intersection, Roundabout, Open scenarios)
- Sub-goal enforcement policies excel in constrained conflict zones, but can degrade in unstructured scenarios
- "Only Local" performs in sparse settings but fails under high agent density ( agents)
- LSTM-based policies enhance generalization and robustness to increased agent numbers
- SNGNN-rewarded agents (SocNavGym) achieve higher personal-space compliance (over 97%) and lower discomfort metrics than heuristic baselines (Sprague et al., 2023, Kapoor et al., 2023)
7. Installation, Customization, and Extensibility
SocialGym 2.0 is distributed for rapid deployment and extension:
- Installation:
- Docker-based setup (
git clone,docker build,docker run) - Internal ROS configuration without the need for custom node programming
- SocNavGym:
pip install socnavgymor source install
- Docker-based setup (
- Customization:
- Scenario and map editing via Python helpers or YAML/JSON configuration
- Modular swapping of observation, reward, and algorithm components
- Scenario randomization and curriculum learning via scenario generator and configuration scripts
- Extension hooks for custom entities, reward functions, and human-trajectory predictors
- Sample Usage: Minimal Python scripts train and evaluate policies; experiment runners support large-scale batch experiments. Switching algorithms or reward modules is managed by editing config files.
The open architecture enables adoption for standardized benchmarking, principled reward function research, and integration with state-of-the-art RL tools.
SocialGym 2.0 provides researchers with a comprehensive, reproducible framework for multi-agent social navigation under complex, dynamic, and human-centric constraints, facilitating research at the intersection of robotics, reinforcement learning, and human-robot interaction (Sprague et al., 2023, Kapoor et al., 2023).