Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer (2207.14024v5)

Published 28 Jul 2022 in cs.CV, cs.AI, cs.LG, and cs.RO

Abstract: Large-scale deployment of autonomous vehicles has been continually delayed due to safety concerns. On the one hand, comprehensive scene understanding is indispensable, a lack of which would result in vulnerability to rare but complex traffic situations, such as the sudden emergence of unknown objects. However, reasoning from a global context requires access to sensors of multiple types and adequate fusion of multi-modal sensor signals, which is difficult to achieve. On the other hand, the lack of interpretability in learning models also hampers the safety with unverifiable failure causes. In this paper, we propose a safety-enhanced autonomous driving framework, named Interpretable Sensor Fusion Transformer(InterFuser), to fully process and fuse information from multi-modal multi-view sensors for achieving comprehensive scene understanding and adversarial event detection. Besides, intermediate interpretable features are generated from our framework, which provide more semantics and are exploited to better constrain actions to be within the safe sets. We conducted extensive experiments on CARLA benchmarks, where our model outperforms prior methods, ranking the first on the public CARLA Leaderboard. Our code will be made available at https://github.com/opendilab/InterFuser

Citations (156)

View on Semantic Scholar

Summary

The paper presents InterFuser, a novel sensor fusion transformer that achieves top performance on CARLA benchmarks.
It fuses multi-modal data from RGB cameras and LiDAR using an encoder-decoder transformer to generate interpretable scene features.
A dedicated safety controller ensures decisions remain within safe limits, providing actionable insights for robust autonomous driving.

Overview of "Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer"

The paper "Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer" presents a novel approach to autonomous driving, addressing existing challenges in perception and decision-making by leveraging a model named InterFuser. This framework utilizes a multi-modal sensor fusion transformer to enhance scene understanding and ensure safety through interpretability.

Introduction and Background

The autonomous driving sector has faced delays due to safety issues, particularly in high-density traffic situations where rarity and complexity of events pose significant challenges. Current systems often struggle with scene understanding, especially when unexpected entities appear suddenly. The need for multi-modal sensor fusion becomes critical to mitigate these challenges, yet existing approaches either oversimplify or inadequately address sensor integration. Additionally, the interpretability of decision-making processes is crucial for diagnosing failures and enhancing safety, yet remains underexplored.

InterFuser Framework

InterFuser processes and unifies data from multiple sensor types and viewpoints through a transformer model, providing comprehensive scene understanding and making safety-informed decisions.

Sensor Inputs: The framework integrates data from RGB cameras and LiDAR. The cameras provide multi-view images, capturing various perspectives, while LiDAR offers 3D context. This integration facilitates robust scene perception, including distant traffic signals and occluded objects.
Transformer Encoder and Decoder: A transformer encoder is employed to fuse these multi-modal inputs effectively. The encoder produces intermediate representations that are processed by a decoder, which outputs vehicle trajectories and interpretable features such as object density maps and traffic signals.
Safety Controller: Innovatively, the system incorporates a safety controller to ensure that actions remain within predefined safe sets by utilizing the interpretable features. This functionality not only enhances safety but also improves model feedback by identifying failure causes for iterative refinement.

Experimental Evaluation

Comprehensive experiments conducted on CARLA benchmarks highlight the superiority of InterFuser over prior methods. The model ranked first on the CARLA Leaderboard, demonstrating significant improvements in driving score and infraction score through effective scene comprehension and safety enhancements.

Key Contributions

Development of InterFuser, a sensor fusion transformer promoting extensive global contextual reasoning across different modalities and views.
Introduction of intermediate interpretable features to significantly enhance safety and interpretability in autonomous driving applications.
State-of-the-art performance on complex urban driving scenarios, validated through empirical testing on established benchmarks.

Implications and Future Outlook

From a theoretical standpoint, this research advances the integration of structured interpretability within deep learning models applied to autonomous vehicles. Practically, InterFuser offers a robust framework with potential scalability to real-world applications, addressing longstanding barriers in sensor integration and model transparency. Future work includes exploration of more advanced trajectory prediction models and controllers, incorporating temporal information for improved performance, and extending these approaches to handle diverse and unpredictable real-world driving conditions. This work positions itself as a meaningful step towards safer and more reliable autonomous driving systems.

PDF Markdown

Related Papers

GitHub

GitHub - opendilab/InterFuser: [CoRL 2022] InterFuser: Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer (546 stars)

Tweets

https://twitter.com/OpenDILab/status/1562418966685155334

https://twitter.com/OpenDILab/status/1570343679205965829

https://twitter.com/OpenDILab/status/1555042934822543360