SocNav Dataset: Benchmarking Social Navigation

Updated 30 November 2025

SocNav Dataset is a suite of datasets providing structured, human-annotated static and dynamic scenarios to study social navigation behaviors.
It integrates diverse data modalities including spatial configurations, time-series trajectories, and visual-language cues for robust benchmarking.
Annotations using crowdsourced and expert ratings measure social comfort, disruption, and task efficiency, ensuring reliable performance metrics.

SocNav Dataset refers to a suite of datasets—originating with SocNav1, SocNav2, and extending to new large-scale instances (e.g., as described in (Chen et al., 26 Nov 2025))—created for benchmarking and learning socially compliant navigation behaviors in robots and embodied agents. These datasets provide structured, human-annotated scenarios and/or large-scale demonstrations designed to capture the complexity of social navigation, personal and interaction-space constraints, and multi-agent interactions. The range spans from static scene comfort assessments to dynamic multi-modal trajectory sequences, supporting both analytic modeling and advanced deep learning techniques for social navigation metrics and policies.

1. Dataset Composition and Generations

The original SocNav1 dataset comprises 5,735 unique static scenarios with 9,280 human labels, capturing robot positional configurations in proximity to humans and obstacles inside indoor spaces. Each sample is annotated with a scalar “social comfort score” in $[0,100]$ , interpreted as the human-perceived comfort if a robot were to occupy a given pose. The scenarios encode: (a) a polyline map of the room; (b) the 2D positions and orientations of all agents (robot and humans); (c) static objects and their spatial extents; and (d) explicit human-human or human-object “interaction links” indicating joint attention, conversation, or spatial interdependence (Manso et al., 2019).

The follow-up SocNav2 dataset transitions to dynamic, time-indexed scenarios, each spanning a 4-s horizon with 35 frames and capturing the full (x, y, θ, v_x, v_y, ω) state for both robots and simulated humans (Bachiller et al., 2021). SocNav2 introduces two scalar scores per sequence—Q1 (“Robot does not disturb humans”) and Q2 (“Robot moves efficiently toward the goal without disturbance”)—annotated in $[0,100]$ intervals by six human raters per scenario, and reflecting both social disruption and task efficiency. These cover more complex, dynamic multi-agent settings with up to ~15 humans, variable numbers, explicit goal-directed navigation, and dynamic “interaction” flags.

Recently, large-scale derivatives extend the SocNav concept far beyond scenario enumeration. For example, the “SocNav Dataset” of (Chen et al., 26 Nov 2025) aggregates ≈7 million samples comprising both synthetic and real-world expert trajectories and cognitive activation signals, annotated for traversability, chain-of-thought reasoning, and visual question answering. Table 1 summarizes the composition of this expansive corpus:

Component	Count
Internet-video pseudo-trajectories (D_video)	2,000,000
High-fidelity simulated trajectories (D_sim)	1,700,000
Real-world robot trajectories (D_real)	340,000
Social traversability polygons (D_trav)	1,200,000
Navigation chain-of-thought (D_CoT)	825,000
Visual QA samples (D_VQA)	1,000,000
Total	~7,065,000

Samples contain rich spatiotemporal, visual, and language-based information, supporting diverse learning pipelines from imitation learning to vision-language modeling (Chen et al., 26 Nov 2025).

2. Annotation Protocols and Human Ratings

SocNav1 and SocNav2 employ crowdsourced ratings to operationalize human social conventions:

SocNav1 human raters (n=12) assigned an integer “comfort” score per configuration with explicit guidelines: intrusion into personal or interaction space, collision-imposing positions (score=0), adjusted for density, all focusing on social cost, not task success. Scores are pooled and analyzed for agreement (e.g., pooled standard deviation).
SocNav2 raters (n=6) independently scored each dynamic sequence on Q1 and Q2, measuring both social non-disruption and efficient navigation. Inter- and intra-rater agreement was analyzed using linearly weighted kappa, with intra-rater κ in Q1 at 0.83–0.88 (“almost perfect”) and inter-rater κ at 0.56–0.85 (“moderate” to “substantial”) (Bachiller et al., 2021).
Large-scale SocNav Dataset (Chen et al., 26 Nov 2025): Social traversability and chain-of-thought annotations are produced by trained operators and language annotators. Traversable regions are demarcated as low-vertex polygons into coarse social occupancy maps. CoT responses are elicited via structured prompts, pairing image/trajectory history with a textual completion.
Quality Control: All SocNav variants adopt strict quality pipelines, including duplicate ratings, control items, and κ-based filtering to ensure inter-rater reliability (e.g., SocNav3: only raters with $\kappa_{intra}\geq0.1$ , $\kappa_{inter}\geq0.2$ retained) (Bachiller-Burgos et al., 1 Sep 2025).

3. Data Formats, Features, and Scenario Representation

SocNav datasets are delivered as line-based or per-scenario JSON objects (or, for large-scale versions, in combined image, trajectory, and language files):

SocNav1/2 Static/Dynamic Examples: Each scenario describes a set of agents with their positions, orientations, and types; explicit room/wall polylines; object positions; and, for SocNav2, full velocity/orientation time-series. Features suitable for learning include Euclidean distances $d_{ij}$ , graph-based adjacency for explicit or proximity-based edges, one-hot entity/type coding, and additional crowd-density or field-of-view flags (Manso et al., 2019, Bachiller et al., 2021).
Large SocNav Dataset (Chen et al., 26 Nov 2025): Each sample encodes a visual history ( $O_{t-n:t}$ , RGB), pose history ( $P_{t-n:t}$ , 2D), navigation goal ( $g$ ), and trajectory/action target for point-goal navigation. Cognitive Activation samples provide images with polygonal traversability masks, or multi-modal (trajectory + language) pairs for CoT and VQA.
Evaluation Split: Standard train/dev/test partitions are provided, often with augmentation routines such as mirroring (x→–x) or rotation.

4. Metrics, Benchmarks, and Learning Protocols

SocNav datasets define clear quantitative and qualitative metrics for benchmarking:

Prediction tasks: Regression on comfort or disruption score ( $S_c$ or $Q_1$ , $Q_2$ ), mean absolute error (MAE), root mean square error (RMSE), and $R^2$ ; for binned scores, classification accuracy or weighted kappa. Scenario-to-graph mappings admit graph neural network (GNN) usage, e.g., Graph Convolutional Networks and message-passing layers, with per-node features and edge encoding (Manso et al., 2019, Bachiller et al., 2021).
Baseline models: Recent SocNav benchmarks provide sequence models (4-layer GRU + MLP) regressing to $[0,1]$ using robot, analytical, and context embeddings as features. Supervised with MSE to human scores and LeakyReLU activations, these models demonstrate context-sensitive metric adaptation, e.g., penalizing high speed in “lab” tasks but rewarding it in “fire” tasks (Bachiller-Burgos et al., 1 Sep 2025).
Social navigation challenge tasks: Large-scale SocNav datasets enable policy learning benchmarks utilizing imitation learning (IL) and reinforcement learning (RL), supported by metrics such as success rate (SR), route completion, SPL (Success weighted by Path Length), and Distance/Time Compliance Rate (DCR/TCR) (see definitions in (Chen et al., 26 Nov 2025)). Open-loop angular error and compliance with annotated traversable regions are recommended for benchmarking.

5. Public Availability, Access Protocols, and Licensing

All canonical SocNav datasets are published under open or research-use licenses, with code and data repositories providing full experimental pipelines:

Dataset/Tool	Repository/URL	License
SocNav1	https://github.com/ljmanso/SocNav1	CC-BY/MIT
SocNav2	https://github.com/gnns4hri/sngnnv2	Public/Academic
SocNav3	https://github.com/SocNavData/SocNav3	Apache 2.0
Large-Scale SocNav	https://amap-eai.github.io/SocialNav/	Apache 2.0

Researchers can generate or simulate new scenarios using provided survey tools and schema, collect additional human ratings, and train or adapt models under a common, transparent benchmark (Bachiller-Burgos et al., 1 Sep 2025, Manso et al., 2019, Chen et al., 26 Nov 2025).

6. Impact, Usage Scenarios, and Future Directions

SocNav datasets are foundational tools for evaluating and improving socially aware robot navigation. They enable:

Data-driven social cost estimation and benchmarking for both analytic and learned navigation metrics.
Development of robust graph-based and sequence-based models sensitive to context, task, and multi-agent dynamics.
Integration with vision-language and chain-of-thought reasoning for embodied AI and foundation-model research (Chen et al., 26 Nov 2025).
Standards for fair comparison across algorithms and reproducible social navigation system evaluation.

The SocNav methodology has directly influenced the design and validation of large-scale foundation models for embodied navigation, established rigorous annotation and consensus pipelines, and catalyzed new research on context-adaptive social metric learning and interactive, human-in-the-loop navigation policy optimization.

As social navigation remains an open multi-disciplinary challenge, the evolving SocNav datasets continue to serve as empirical grounds for benchmarking, model selection, and simulation-to-real transfer in socially aware robotics (Bachiller-Burgos et al., 1 Sep 2025, Manso et al., 2019, Chen et al., 26 Nov 2025).