DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving (2411.15139v1)

Published 22 Nov 2024 in cs.CV and cs.RO

Abstract: Recently, the diffusion model has emerged as a powerful generative technique for robotic policy learning, capable of modeling multi-mode action distributions. Leveraging its capability for end-to-end autonomous driving is a promising direction. However, the numerous denoising steps in the robotic diffusion policy and the more dynamic, open-world nature of traffic scenes pose substantial challenges for generating diverse driving actions at a real-time speed. To address these challenges, we propose a novel truncated diffusion policy that incorporates prior multi-mode anchors and truncates the diffusion schedule, enabling the model to learn denoising from anchored Gaussian distribution to the multi-mode driving action distribution. Additionally, we design an efficient cascade diffusion decoder for enhanced interaction with conditional scene context. The proposed model, DiffusionDrive, demonstrates 10$\times$ reduction in denoising steps compared to vanilla diffusion policy, delivering superior diversity and quality in just 2 steps. On the planning-oriented NAVSIM dataset, with the aligned ResNet-34 backbone, DiffusionDrive achieves 88.1 PDMS without bells and whistles, setting a new record, while running at a real-time speed of 45 FPS on an NVIDIA 4090. Qualitative results on challenging scenarios further confirm that DiffusionDrive can robustly generate diverse plausible driving actions. Code and model will be available at https://github.com/hustvl/DiffusionDrive.

Summary

The paper presents a novel truncated diffusion policy that leverages anchored Gaussian priors to reduce computation while maintaining diverse driving actions.
The study incorporates a transformer-based cascade diffusion decoder with sparse deformable attention to effectively integrate scene context for refined trajectory prediction.
The method achieves impressive benchmarks with a PDMS score of 88.1 and real-time speeds of 45 FPS on an NVIDIA 4090, outperforming state-of-the-art approaches.

Summary of "DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving"

The paper "DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving" introduces an innovative approach to leveraging diffusion models in the field of autonomous driving. The authors propose a novel truncated diffusion policy that enhances the generative capacity of diffusion models for real-time autonomous driving applications.

Background and Motivation

The recent advancements in diffusion models have showcased their potential in various generative tasks across robotics, particularly in modeling complex multi-modal distributions of actions. In the context of autonomous driving, there is a persistent challenge in generating diverse and high-quality driving actions within dynamic and realistic traffic scenarios. Traditional approaches, like those utilizing large fixed vocabularies of anchor points, face scalability and computational challenges. Therefore, the authors of this paper aim to tackle these issues by introducing a diffusion-based approach that reduces computational overhead while enhancing trajectory diversity.

Methodological Advancements

Truncated Diffusion Policy

One of the core contributions of this paper is the introduction of a truncated diffusion policy, which differs significantly from traditional diffusion policies by beginning the denoising process from an anchored Gaussian distribution. This innovation allows the model to incorporate a pre-defined set of driving patterns, or "anchors," that guide the diffusion process. By truncating both the diffusion schedule and the number of denoising steps, the method drastically reduces computational requirements while maintaining the ability to explore diverse action spaces.

Efficient Cascade Diffusion Decoder

To complement the truncated diffusion policy, the authors propose a new architecture—a transformer-based cascade diffusion decoder. This decoder is specifically designed to enhance interaction with scene context information, which is crucial for driving tasks. By employing a sparse deformable attention mechanism, the decoder can effectively refine trajectory predictions through multi-layer interactions with conditioned scene information, allowing for more informed and accurate planning.

Experimental Results and Analysis

Through extensive experimentation on the NAVSIM dataset, DiffusionDrive demonstrates significant advancements over existing state-of-the-art end-to-end planning methodologies. Notably, it achieves an impressive PDMS score of 88.1 using a ResNet-34 backbone, surpassing the performance of approaches such as VADv2 and Hydra-MDP, which rely on large anchor vocabularies and post-processing techniques. Furthermore, DiffusionDrive achieves real-time speeds of 45 FPS on an NVIDIA 4090 GPU, which marks a considerable improvement in computational efficiency. This is reinforced by qualitative assessments that highlight DiffusionDrive's capacity to generate diverse and contextually appropriate trajectories in challenging traffic conditions.

Implications and Future Work

The results presented in this paper illustrate the feasibility and benefits of integrating diffusion models into end-to-end autonomous driving tasks. The combination of computational efficiency and enhanced generative capacity opens the door for broader applications in dynamic driving environments. Future work may explore the integration of this diffusion framework with other sensory modalities or expand its application scope to broader robotics domains, potentially ushering in a new wave of diffusion-based interventions in real-time decision-making tasks.

In conclusion, this paper provides a compelling case for the adoption of diffusion models in autonomous driving, offering a balance of innovation in both theoretical underpinning and practical application. As the field progresses, such methodologies are likely to catalyze further research and development in designing sophisticated, adaptive driving policies for autonomous vehicles.

PDF Markdown

Related Papers

GitHub

GitHub - hustvl/DiffusionDrive: Taming Diffusion Model for Real-Time End-to-End Autonomous Driving (28 stars)

Tweets

https://twitter.com/_akhaliq/status/1862162544120492335

https://twitter.com/XinggangWang/status/1895700924062118107

https://twitter.com/abursuc/status/1861551813582880928

https://twitter.com/XinggangWang/status/1869350602666660183

https://twitter.com/taziku_co/status/1862496821840634306

https://twitter.com/arXivGPT/status/1862561628232769716

HackerNews

DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving (2 points, 0 comments)