Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network (2005.07796v1)

Published 15 May 2020 in cs.CV, cs.LG, and eess.IV

Abstract: Pedestrian intention recognition is very important to develop robust and safe autonomous driving (AD) and advanced driver assistance systems (ADAS) functionalities for urban driving. In this work, we develop an end-to-end pedestrian intention framework that performs well on day- and night- time scenarios. Our framework relies on objection detection bounding boxes combined with skeletal features of human pose. We study early, late, and combined (early and late) fusion mechanisms to exploit the skeletal features and reduce false positives as well to improve the intention prediction performance. The early fusion mechanism results in AP of 0.89 and precision/recall of 0.79/0.89 for pedestrian intention classification. Furthermore, we propose three new metrics to properly evaluate the pedestrian intention systems. Under these new evaluation metrics for the intention prediction, the proposed end-to-end network offers accurate pedestrian intention up to half a second ahead of the actual risky maneuver.

Citations (36)

Summary

  • The paper demonstrates that merging spatio-temporal and skeletal features via early fusion significantly improves pedestrian intention prediction.
  • It introduces an end-to-end framework combining bounding boxes and skeletal data, achieving an average precision of 0.89 with robust recall.
  • The study offers valuable insights for AD/ADAS by predicting risky pedestrian maneuvers half a second in advance in urban settings.

FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network

The paper presents FuSSI-Net, an innovative approach towards enhancing pedestrian intention prediction using an advanced fusion network combining spatio-temporal and skeletal features. This research is aimed at improving the performance of autonomous driving (AD) and advanced driver assistance systems (ADAS), focusing on accurately predicting pedestrian intentions, particularly at urban road intersections.

Overview of the Approach

The research develops an end-to-end framework adaptable to both day and night-time settings, addressing the critical challenge of reducing false positives in pedestrian intention classification. The framework utilizes bounding boxes (BBs) and skeletal features of human poses to predict pedestrian intentions with high precision. Key to this is the exploration of early, late, and combined fusion mechanisms intended to bolster overall prediction performance.

In terms of methodology, the paper builds on existing object detection and skeletal fitting methods, combining these through a comprehensive fusion network. The skeletal features used here are especially noteworthy for their role in diminishing false detection rates, thus bolstering the accuracy of intention prediction.

Notable Achievements and Results

The research demonstrates significant numerical results, where the early fusion model achieves an average precision (AP) of 0.89, alongside a precision/recall score of 0.79/0.89. Unlike other works relying solely on bounding box detection or CNNs for classification, this approach incorporates additional skeletal information, yielding more reliable results in varied conditions.

Further innovations include the introduction of novel evaluation metrics specifically designed for assessing pedestrian intentions, enabling the framework to foresee risky maneuvers half a second in advance. This capability is critical for AD and ADAS systems, allowing for timely preventive measures against potential collisions.

Implications and Future Directions

This research has significant implications both practically and theoretically. Practically, the enhanced prediction accuracy can lead to improved safety measures in autonomous vehicles, potentially reducing accident rates in urban environments. Theoretically, the fusion of spatio-temporal with skeletal features opens avenues for further exploration into multi-modal data integration in intention prediction tasks.

Looking ahead, future developments could focus on refining false negative predictions. This fine-tuning is vital for not only increasing prediction reliability but also for enabling precise safety distance assessments and ego-vehicle maneuvers within AD systems. Additionally, expanding the datasets to include more diverse scenarios, including varying weather and lighting conditions, could further enhance the robustness of the model.

This paper thus marks a notable step forward in pedestrian intention prediction, providing a viable path for both immediate application in AD/ADAS technology and future research endeavors within the field of intelligent transport systems.

Youtube Logo Streamline Icon: https://streamlinehq.com