Social-HM3D Benchmark

Updated 11 October 2025

Social-HM3D is a large-scale benchmark featuring 3D indoor scenes populated with simulated human agents exhibiting realistic, goal-driven, and collision-avoidant behaviors.
It evaluates socially-aware navigation by measuring key metrics such as task success, path efficiency, and adherence to personal space norms.
The Falcon framework, integrated with future trajectory prediction, enhances anticipatory planning and social compliance in dynamic indoor environments.

The Social-HM3D benchmark is a large-scale evaluation suite targeting socially-aware navigation and interaction in indoor environments, constructed on photo-realistic 3D scenes densely populated with simulated human agents exhibiting goal-driven, collision-avoidant behaviors. It is specifically designed to test embodied agents’ capacity to navigate complex real-world scenarios with compliance to social norms, such as maintaining personal space and avoiding disruptions to natural human movement. The benchmark was introduced alongside the Falcon reinforcement learning framework, which leverages future trajectory prediction and dedicated penalization schemes to encourage anticipatory, contextually sensitive planning and action selection (Gong et al., 2024).

1. Benchmark Construction and Scene Composition

The Social-HM3D benchmark utilizes 844 scenes from the Habitat-Matterport 3D (HM3D) dataset, encompassing diverse environments including residences, offices, and retail spaces. Each scene is populated with up to six human agents, with the density parameterized to reflect plausible occupancy given the area of each environment. Humans in the benchmark are not static obstacles but autonomous agents following realistic movement patterns—implementing goal-seeking behavior and natural pausing, realized via multi-agent collision-avoidance algorithms such as ORCA. Scene diversity ensures embodied agents encounter a spectrum of topological challenges and crowd configurations, mirroring real-world use cases such as indoor robot navigation, service robotics, and adaptive planning in socially dynamic interiors.

A secondary dataset, Social-MP3D, based on 72 scenes from Matterport3D, is included to test agents’ generalization in unseen or distributionally shifted environments. Both datasets ensure that complex human–robot interactions and social behavior compliance are rigorously tested under photorealistic renderings.

The principal task evaluated on Social-HM3D is goal-driven navigation by an embodied agent (robot) in the presence of dynamic, social human agents. The agent must reach randomly assigned goals while preserving social norms (e.g., sufficient clearance with respect to individuals) and avoiding collisions.

Task performance is measured using classical embodied AI metrics:

Task Success Rate (SR): Fraction of trials where the agent reaches the goal.
Success weighted by Path Length (SPL): Efficiency of navigation relative to the optimal path.
Success weighted by Time Length (STL): Shortest time to goal given episode constraints.

Social compliance is quantified with metrics including:

Personal Space Compliance (PSC): Proportion of time the agent maintains a predefined safety radius (set to 1.0 m) from all humans.
Human Collision Rate: Percentage of episodes involving contact between the agent and a human agent.

High PSC values (∼90%) indicate adherence to proxemic standards, a critical indicator of practical robot behavior in densely populated or socially sensitive spaces.

3. Human Behavior Modeling and Scene Dynamics

Humans in Social-HM3D are simulated via model-based motion planners that combine deterministic goals with stochastic trajectory perturbations, pausing, and avoidance. Agents’ initial positions and destinations are randomized per episode. kinematic updates incorporate distributed collision-avoidance via ORCA, which compute velocity commands based on predicted future positions of all agents, yielding naturalistic pedestrian flows.

This simulation framework yields emergent behaviors such as bottlenecks, lane formation, and occlusion—all phenomena highly relevant to embodied AI and autonomous system planning. The scene design explicitly balances challenge complexity by scaling human density to environment size, preventing unrealistically crowded or sparse navigation spaces.

The Falcon model employs deep reinforcement learning augmented with trajectory prediction to enable anticipatory social navigation. The architecture comprises:

Observation Processing: Egocentric depth images and GPS+Compass readings, encoded via ResNet-50 and stacked LSTM layers.
Temporal Feature Extraction: Two-layer LSTM captures action–observation history.
Auxiliary Spatial-Temporal Precognition Module: Predicts human counts, 2D locations, and future H-step trajectories using a bi-directional stacked LSTM with attention.
Reward Shaping via Social Cognition Penalty (SCP): Penalizes both immediate and future blockage of human paths. The “Trajectory Obstruction Penalty” is formulated:

$r_{traj} = \sum_{k=t+1}^{t+H} \sum_{i=1}^N \left\{ \beta_{traj} \cdot \frac{1}{k-t+1} \quad \text{if} \quad d_{traj_i}^k < 0.05m; \quad 0 \, \text{otherwise} \right\}$

where $d_{traj_i}^k$ is the robot–human distance at predicted time $k$ . Additional terms penalize human proximity violations and collisions.

Loss Function Integration: Overall training loss is a weighted sum of main DD-PPO navigation loss and auxiliary task losses for precognition:

$L_{total} = \beta_{main} \cdot L_{main} + \beta_{aux} \cdot L_{aux}$

This explicit multi-timescale penalization drives agents to learn policies that anticipate and avoid future human positions, yielding behaviors that respect both task efficiency and social constraints.

5. Experimental Findings and Baseline Comparisons

On Social-HM3D, Falcon substantially outperforms both classical planners (A*, ORCA) and proximity-aware RL methods. The model achieves a 55% task success rate while maintaining approximately 90% personal space compliance, demonstrating robust performance in dense, realistic indoor scenarios. This dual achievement is significant, as many baseline planners tend to compromise social norms for task success or vice versa.

A plausible implication is that future-aware trajectory forecasting, coupled with structured penalization, provides a scalable solution to socially compliant navigation in high-density, real-world environments—capabilities beyond what immediate-reactive planners typically offer.

6. Benchmark Significance and Research Directions

The Social-HM3D benchmark introduces a comprehensive, scalable evaluation framework for developing social navigation policies in embodied AI. Its combination of photo-realistic environments, dynamic, naturalistic human behavior, and richly parameterized performance metrics enables rigorous testing of navigation algorithms with respect to both efficiency and compliance with social norms.

Current research trajectories include:

Improving predictive models for human movement under uncertainty.
Integrating nuanced social relationship and interaction modeling, as in Social 3D Scene Graph representations (Bartoli et al., 29 Sep 2025).
Expanding task diversity to include complex manipulation, multi-goal planning, and social interaction beyond mere avoidance and proxemics.
Advancing sim-to-real transfer by bridging the gap between simulated human agents and true human behaviors observed in real-world trials.

The benchmark thus fosters interdisciplinary advances at the intersection of embodied AI, robotics, human–computer interaction, and social cognition, providing a rigorous platform for developing agents that act safely, efficiently, and in compliance with societal expectations in complex, dynamic spaces.