FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network

Published 15 May 2020 in cs.CV, cs.LG, and eess.IV | (2005.07796v1)

Abstract: Pedestrian intention recognition is very important to develop robust and safe autonomous driving (AD) and advanced driver assistance systems (ADAS) functionalities for urban driving. In this work, we develop an end-to-end pedestrian intention framework that performs well on day- and night- time scenarios. Our framework relies on objection detection bounding boxes combined with skeletal features of human pose. We study early, late, and combined (early and late) fusion mechanisms to exploit the skeletal features and reduce false positives as well to improve the intention prediction performance. The early fusion mechanism results in AP of 0.89 and precision/recall of 0.79/0.89 for pedestrian intention classification. Furthermore, we propose three new metrics to properly evaluate the pedestrian intention systems. Under these new evaluation metrics for the intention prediction, the proposed end-to-end network offers accurate pedestrian intention up to half a second ahead of the actual risky maneuver.

Abstract PDF Upgrade to Chat

Citations (36)

View on Semantic Scholar

Summary

The paper demonstrates that merging spatio-temporal and skeletal features via early fusion significantly improves pedestrian intention prediction.
It introduces an end-to-end framework combining bounding boxes and skeletal data, achieving an average precision of 0.89 with robust recall.
The study offers valuable insights for AD/ADAS by predicting risky pedestrian maneuvers half a second in advance in urban settings.

FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network

The paper presents FuSSI-Net, an innovative approach towards enhancing pedestrian intention prediction using an advanced fusion network combining spatio-temporal and skeletal features. This research is aimed at improving the performance of autonomous driving (AD) and advanced driver assistance systems (ADAS), focusing on accurately predicting pedestrian intentions, particularly at urban road intersections.

Overview of the Approach

The research develops an end-to-end framework adaptable to both day and night-time settings, addressing the critical challenge of reducing false positives in pedestrian intention classification. The framework utilizes bounding boxes (BBs) and skeletal features of human poses to predict pedestrian intentions with high precision. Key to this is the exploration of early, late, and combined fusion mechanisms intended to bolster overall prediction performance.

In terms of methodology, the study builds on existing object detection and skeletal fitting methods, combining these through a comprehensive fusion network. The skeletal features used here are especially noteworthy for their role in diminishing false detection rates, thus bolstering the accuracy of intention prediction.

Notable Achievements and Results

The research demonstrates significant numerical results, where the early fusion model achieves an average precision (AP) of 0.89, alongside a precision/recall score of 0.79/0.89. Unlike other works relying solely on bounding box detection or CNNs for classification, this approach incorporates additional skeletal information, yielding more reliable results in varied conditions.

Further innovations include the introduction of novel evaluation metrics specifically designed for assessing pedestrian intentions, enabling the framework to foresee risky maneuvers half a second in advance. This capability is critical for AD and ADAS systems, allowing for timely preventive measures against potential collisions.

Implications and Future Directions

This research has significant implications both practically and theoretically. Practically, the enhanced prediction accuracy can lead to improved safety measures in autonomous vehicles, potentially reducing accident rates in urban environments. Theoretically, the fusion of spatio-temporal with skeletal features opens avenues for further exploration into multi-modal data integration in intention prediction tasks.

Looking ahead, future developments could focus on refining false negative predictions. This fine-tuning is vital for not only increasing prediction reliability but also for enabling precise safety distance assessments and ego-vehicle maneuvers within AD systems. Additionally, expanding the datasets to include more diverse scenarios, including varying weather and lighting conditions, could further enhance the robustness of the model.

This study thus marks a notable step forward in pedestrian intention prediction, providing a viable path for both immediate application in AD/ADAS technology and future research endeavors within the field of intelligent transport systems.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (16)

First 10 authors:

Collections

YouTube

Show All Videos

FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network

Summary

FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network

Overview of the Approach

Notable Achievements and Results

Implications and Future Directions

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (16)

Collections

YouTube