HE-Drive: Human-Like End-to-End Driving with Vision Language Models

Published 7 Oct 2024 in cs.CV and cs.RO | (2410.05051v1)

Abstract: In this paper, we propose HE-Drive: the first human-like-centric end-to-end autonomous driving system to generate trajectories that are both temporally consistent and comfortable. Recent studies have shown that imitation learning-based planners and learning-based trajectory scorers can effectively generate and select accuracy trajectories that closely mimic expert demonstrations. However, such trajectory planners and scorers face the dilemma of generating temporally inconsistent and uncomfortable trajectories. To solve the above problems, Our HE-Drive first extracts key 3D spatial representations through sparse perception, which then serves as conditional inputs for a Conditional Denoising Diffusion Probabilistic Models (DDPMs)-based motion planner to generate temporal consistency multi-modal trajectories. A Vision-LLMs (VLMs)-guided trajectory scorer subsequently selects the most comfortable trajectory from these candidates to control the vehicle, ensuring human-like end-to-end driving. Experiments show that HE-Drive not only achieves state-of-the-art performance (i.e., reduces the average collision rate by 71% than VAD) and efficiency (i.e., 1.9X faster than SparseDrive) on the challenging nuScenes and OpenScene datasets but also provides the most comfortable driving experience on real-world data.For more information, visit the project website: https://jmwang0117.github.io/HE-Drive/.

Abstract PDF HTML Upgrade to Chat

Authors (10)

References (44)

Summary

The paper introduces HE-Drive, a novel autonomous driving system that generates human-like trajectories with enhanced temporal consistency.
It employs sparse 3D perception and conditional denoising diffusion models to produce multi-modal motion plans for improved safety and comfort.
Vision-language models are used to score trajectories, achieving a 71% reduction in collision rates and ensuring a smooth driving experience.

The paper "HE-Drive: Human-Like End-to-End Driving with Vision LLMs" introduces an innovative approach to autonomous driving that prioritizes temporal consistency and passenger comfort. This method, called HE-Drive, leverages recent advancements in machine learning to address issues in trajectory prediction commonly faced by imitation learning-based planners.

Key Contributions:

End-to-End Autonomous Driving: HE-Drive is designed as a human-like-centric system to improve the performance and quality of autonomous vehicle navigation by generating trajectories that feel more natural and comfortable to passengers.
Sparse Perception and Conditional Inputs: The approach starts by extracting significant 3D spatial data through sparse perception methods. These data points are used as conditional inputs for the motion planning phase.
Conditional Denoising Diffusion Probabilistic Models (DDPMs): The paper utilizes Conditional DDPMs to facilitate the generation of temporally consistent and multi-modal trajectories. This technique ensures that the trajectories align well over time, addressing the problem of temporal inconsistency.
Vision-LLMs (VLMs) for Scoring: A key innovation is the use of Vision-LLMs to guide trajectory scoring. These models assess and select the most comfortable trajectory from numerous candidates, ensuring a higher degree of comfort in vehicle control.
Performance Metrics: Experimental results demonstrate HE-Drive’s superior performance. It reduces the average collision rate by 71% compared to the Visual Autonomous Driving (VAD) system and operates 1.9 times faster than SparseDrive. The solution also excels in providing a comfortable driving experience based on real-world assessments.

Application and Impact:

HE-Drive signifies a significant step towards creating more sophisticated and human-like driving experiences in autonomous vehicles. By focusing on comfort and efficiency, the system could contribute substantially to the adoption of autonomous technologies in consumer markets.

Datasets:

The authors tested HE-Drive on challenging datasets such as nuScenes and OpenScene, underlining its robustness and applicability in diverse scenarios.

Overall, HE-Drive represents a compelling synthesis of cutting-edge technologies in machine learning, positioning itself as a potential leader in end-to-end autonomous driving systems.

Markdown Report Issue