Papers
Topics
Authors
Recent
2000 character limit reached

SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation (2511.21135v1)

Published 26 Nov 2025 in cs.RO, cs.AI, and cs.CV

Abstract: Embodied navigation that adheres to social norms remains an open research challenge. Our \textbf{SocialNav} is a foundational model for socially-aware navigation with a hierarchical "brain-action" architecture, capable of understanding high-level social norms and generating low-level, socially compliant trajectories. To enable such dual capabilities, we construct the SocNav Dataset, a large-scale collection of 7 million samples, comprising (1) a Cognitive Activation Dataset providing social reasoning signals such as chain-of-thought explanations and social traversability prediction, and (2) an Expert Trajectories Pyramid aggregating diverse navigation demonstrations from internet videos, simulated environments, and real-world robots. A multi-stage training pipeline is proposed to gradually inject and refine navigation intelligence: we first inject general navigation skills and social norms understanding into the model via imitation learning, and then refine such skills through a deliberately designed Socially-Aware Flow Exploration GRPO (SAFE-GRPO), the first flow-based reinforcement learning framework for embodied navigation that explicitly rewards socially compliant behaviors. SocialNav achieves +38% success rate and +46% social compliance rate compared to the state-of-the-art method, demonstrating strong gains in both navigation performance and social compliance. Our project page: https://amap-eai.github.io/SocialNav/

Summary

  • The paper introduces SocialNav, a model that integrates high-level social reasoning with low-level trajectory planning to enhance both navigation efficiency and social compliance.
  • It employs a hierarchical 'brain-action' architecture that combines vision-language models with a flow-based reinforcement learning (SAFE-GRPO) framework for norm-aware action generation.
  • The approach is validated on large-scale multimodal datasets and real-world tests, demonstrating significant improvements in success rate and compliance over existing methods.

SocialNav: Human-Inspired Foundation Model for Socially-Aware Embodied Navigation

Introduction and Motivation

The paper "SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation" (2511.21135) addresses a critical limitation in embodied navigation: most learned navigation models focus on geometric efficiency (e.g., shortest paths, collision avoidance) and neglect adherence to social norms. This often leads to robot behaviors such as jaywalking, trespassing, or traversing restricted zones, which are unacceptable in real-world deployments with humans present. SocialNav is introduced as a hierarchical foundation model that explicitly integrates high-level social reasoning and low-level trajectory generation, aiming to close this gap with strong empirical results in both navigation and social compliance.

SocialNav Architecture

SocialNav adopts a hierarchical "brain-action" architecture. The Brain Module leverages a vision-LLM (VLM), specifically Qwen2.5-VL, to encode social navigation priors and produce interpretable chain-of-thought (CoT) explanations or predict socially traversable regions. The Action Expert, built on a conditional flow matching paradigm, generates trajectories from high-level priors in a manner that is both efficient and norm-compliant. Figure 1

Figure 1: SocialNav enables socially-aware navigation by fusing semantic reasoning for traversable zones and CoT explanations with executable trajectory planning in real environments.

The synergy between the cognitive and action branches ensures that semantic understanding and real-world action generation are tightly integrated, enabling the agent to interpret complex social environments and act accordingly.

SocNav Dataset and Benchmark

A major limitation of prior work is the lack of large-scale, multimodal training data that encapsulates both social cognition and action expertise. The SocNav Dataset developed for this project contains over 7 million samples and consists of two complementary components:

  • Expert Trajectories Pyramid (ETP): Aggregates trajectories from internet videos (2M samples), high-fidelity simulated environments (1.7M), and real-world robot data (340K), offering a broad distribution of human-like navigation behaviors across diverse settings.
  • Cognitive Activation Dataset (CAD): Provides social reasoning signals, including 1.2M socially traversable region annotations, 825K navigation CoTs (prompted via Qwen2.5VL-72B), and 1M general VQA samples, instilling semantically grounded knowledge critical for compliance with human norms.

The SocNav Benchmark is a high-fidelity evaluation framework combining Isaac Sim physics and 3DGS rendering, enabling quantitative assessment of navigation and social compliance in nine newly captured large-scale social scenes.

SAFE-GRPO: Flow-Based Reinforcement Learning for Social Compliance

Imitation learning is insufficient for robust social alignment, since behavior cloning from expert demonstrations may not capture the causal structure underpinning social conventions. To address this, SocialNav introduces SAFE-GRPO (Socially-Aware Flow Exploration GRPO), a flow-based reinforcement learning (RL) framework that provides norm-aware reward signals:

  • Social compliance is rewarded via semantic occupancy maps encouraging clearance from non-traversable areas.
  • Additional terms in the reward function incentivize trajectory smoothness, efficiency, and expert similarity.
  • A stochastic differential equation controls exploration with semantic guidance from the VLM, contrasting with unstructured random search in standard RL.

This design ensures that the agent internalizes the principles of social navigation rather than mimicking surface-level actions, leading to robust generalization in complex, dynamic settings.

Experimental Results and Numerical Analysis

The SocialNav model is thoroughly evaluated in open-loop, closed-loop, and real-world robotic deployment settings. Key metrics include success rate (SR), route completion (RC), success weighted by path length (SPL), and two novel measures of social compliance: Distance Compliance Rate (DCR) and Time Compliance Rate (TCR).

Closed-Loop Navigation and Social Compliance Results:

  • SocialNav (Full) achieves SR of 86.1 (+38% over the best baseline), RC of 91.2, SPL of 77.4, DCR of 82.5 (+46% over baseline), and TCR of 82.9.
  • Significant improvements over CityWalker (SR 47.8, DCR 36.1, TCR 36.6), GNM, ViNT, and NoMaD baselines demonstrate strong alignment to social norms.

(Figure 2)

Figure 2: Qualitative comparison of trajectories on diverse scenes; SocialNav (green) prefers socially sanctioned walkways, while CityWalker (red) opts for riskier, non-compliant shortcuts.

  • In real-world deployments with a Unitree Go2 robot, SocialNav attains an average SR of 85% (street crossing: 18/20 successful trials), compared to 62.5% for CityWalker and 50% for NoMaD.
  • Qualitative trajectory visualization on the benchmark shows SocialNav consistently avoids restricted areas and follows pedestrian norms, unlike prior methods that frequently violate social rules.

Ablation Studies and Architecture Analysis

Systematic ablation reveals that each data modality is crucial:

  • Large-scale internet trajectories (DvideoD_\text{video}) increase navigation performance and social compliance.
  • Simulated recovery trajectories (DsimD_\text{sim}) enhance robustness.
  • Cognitive annotations (DcogD_\text{cog}) are essential for high social compliance, as RL without cognitive priors degrades performance.
  • SAFE-GRPO RL delivers substantial gains in compliance metrics, but at a minor cost in geometric efficiency, aligning with the trade-off between socially optimal and shortest paths.

Implications and Outlook

Practically, SocialNav provides a scalable, generalizable solution for socially-aware navigation in heterogeneous environments, from urban streets to malls and campuses. Theoretically, it demonstrates the effectiveness of combining interpretability (CoT reasoning), large-scale cognitive action data, and flow-based RL for preference alignment.

Future developments could further extend RL reward signals via vision-LLMs, targeting context-dependent conventions beyond static traversability. Enhancements to reward shaping could yield greater adaptability to nuanced human preferences in dynamic crowds or specialized environments. The foundational dataset and evaluation protocol set a new standard for embodied intelligence research.

Conclusion

SocialNav achieves major advances in both navigation efficiency and social compliance by integrating hierarchical cognitive-action modeling, large-scale semantically annotated datasets, and flow-based RL. The architecture is highly extensible, and the empirical evidence indicates strong generalization and alignment to human values, laying the groundwork for more socially competent embodied agents.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 6 tweets with 45 likes about this paper.