NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning (2304.05979v2)

Published 12 Apr 2023 in cs.RO

Abstract: Developing robotic technologies for use in human society requires ensuring the safety of robots' navigation behaviors while adhering to pedestrians' expectations and social norms. However, maintaining real-time communication between robots and pedestrians to avoid collisions can be challenging. To address these challenges, we propose a novel socially-aware navigation benchmark called NaviSTAR, which utilizes a hybrid Spatio-Temporal grAph tRansformer (STAR) to understand interactions in human-rich environments fusing potential crowd multi-modal information. We leverage off-policy reinforcement learning algorithm with preference learning to train a policy and a reward function network with supervisor guidance. Additionally, we design a social score function to evaluate the overall performance of social navigation. To compare, we train and test our algorithm and other state-of-the-art methods in both simulator and real-world scenarios independently. Our results show that NaviSTAR outperforms previous methods with outstanding performance\footnote{The source code and experiment videos of this work are available at: https://sites.google.com/view/san-navistar

References (27)

Citations (10)

View on Semantic Scholar

Summary

The paper demonstrates a novel hybrid spatio-temporal graph transformer combined with preference learning to enhance socially aware robot navigation.
It integrates spatial and temporal features via a fully connected graph to capture and model human-robot interactions, boosting navigation success and social compliance.
Empirical tests show NaviSTAR outperforms existing methods, achieving higher success rates and social scores in both open and constrained environments.

This essay examines the research presented in the paper titled "NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning." The paper introduces a novel benchmark called NaviSTAR that addresses the complex challenges of socially aware robot navigation by utilizing a hybrid Spatio-Temporal Graph Transformer and preference learning. The primary focus lies in ensuring that robots navigate safely within human environments while adhering to social norms and pedestrian expectations.

The proposed approach leverages the Spatio-Temporal Graph Transformer to capture intricate interactions between humans and robots, thereby comprehensively modeling human-robot interaction (HRI). This framework is augmented by a preference learning mechanism that effectively encodes human expectations and social norms into the decision-making process of robotic navigation systems.

Methodology

The authors distinguish their work from existing methodologies by highlighting the ability of NaviSTAR to integrate spatial and temporal features through a fully connected graph representation of HRI. The core technological innovation lies in the use of a Spatio-Temporal Graph Transformer combined with a multi-modal transformer. This amalgamation allows for the capture of long-term dependencies and fusion of heterogeneous spatial and temporal features inherent to dynamic, human-filled environments.

Numerical results presented in the paper indicate that NaviSTAR exhibits superior performance compared to existing state-of-the-art methods in both simulated and real-world environments. Specifically, the results highlight improvements in both navigation success rates and social compliance as assessed by a novel social score function.

Numerical Results and Claims

The authors present concrete numerical evidence supporting the efficacy of NaviSTAR. In a series of 500 tests conducted under varying conditions, the algorithm demonstrated a higher success rate and improved social scores relative to traditional methods such as CADRL and SARL, as well as more recent approaches like SRNN. Notably, NaviSTAR outperformed these baseline methods not only in open spaces but also in constrained environments with varied fields of view.

The proposed model's superiority is attributed to the sophisticated representation of agent interactions via the Spatio-Temporal Graph Transformer network. This was evident in the visualization of spatial-temporal and cross-modal attention matrices, which showed how the system could accurately interpret and predict interactions and dependencies within a crowd, a capability that is crucial for ensuring safe and socially acceptable navigation.

Additionally, the inclusion of preference learning into the reinforcement learning setup is posited to yield a more natural and desirable robotic behavior by adjusting the reward function based on human feedback.

Implications and Future Developments

The implications of this research are significant for the field of socially aware robot navigation. The ability to seamlessly integrate into human-rich environments while respecting social norms is a pivotal requirement for the deployment of service robots in public spaces. The methodology introduced by NaviSTAR can potentially be extended to other domains requiring complex interaction modeling and decision-making, such as autonomous vehicles and assistive robotics.

Future research could focus on enhancing the scalability of NaviSTAR to handle even more complex environments with a larger number of interacting agents. Additionally, investigating the integration of other forms of human feedback and adaptive learning could further refine the system's ability to comply with diverse social expectations.

In summary, the NaviSTAR framework represents a significant advancement in socially aware navigation by adeptly combining Spatio-Temporal Graph Transformers with preference learning. The empirical evidence provided underscores its capability to outperform existing models in achieving both efficient and socially compliant navigation, marking an important step forward in autonomous robotic systems' ability to interact harmoniously with humans.