Generalizing Motion Planners with Mixture of Experts for Autonomous Driving (2410.15774v2)

Published 21 Oct 2024 in cs.RO and cs.CV

Abstract: Large real-world driving datasets have sparked significant research into various aspects of data-driven motion planners for autonomous driving. These include data augmentation, model architecture, reward design, training strategies, and planner pipelines. These planners promise better generalizations on complicated and few-shot cases than previous methods. However, experiment results show that many of these approaches produce limited generalization abilities in planning performance due to overly complex designs or training paradigms. In this paper, we review and benchmark previous methods focusing on generalizations. The experimental results indicate that as models are appropriately scaled, many design elements become redundant. We introduce StateTransformer-2 (STR2), a scalable, decoder-only motion planner that uses a Vision Transformer (ViT) encoder and a mixture-of-experts (MoE) causal Transformer architecture. The MoE backbone addresses modality collapse and reward balancing by expert routing during training. Extensive experiments on the NuPlan dataset show that our method generalizes better than previous approaches across different test sets and closed-loop simulations. Furthermore, we assess its scalability on billions of real-world urban driving scenarios, demonstrating consistent accuracy improvements as both data and model size grow.

Citations (1)

View on Semantic Scholar

Summary

The paper introduces StateTransformer-2 (STR2), a decoder-only motion planner using a mixture-of-experts to counter modality collapse in autonomous driving.
It benchmarks STR2 on the NuPlan dataset, demonstrating superior generalization and scalability in both open-loop and closed-loop simulation scenarios.
The study shows that increasing model size and leveraging high-quality, diverse data significantly enhance performance for reliable autonomous driving applications.

Generalizing Motion Planners with Mixture of Experts for Autonomous Driving

This paper investigates the challenges and advances in scaling learning-based motion planners for autonomous driving, emphasizing generalization capabilities. The authors focus on how model complexities and training methods can influence planning performance. They specifically present StateTransformer-2 (STR2), a scalable, decoder-only motion planner employing a Vision Transformer (ViT) encoder and a mixture-of-experts (MoE) backbone to address modality collapse and reward balancing challenges, especially in complex urban environments.

Key Contributions and Findings

The paper presents several noteworthy contributions:

Introduction of StateTransformer-2 (STR2): STR2 stands out by utilizing a decoder-only architecture incorporating MoE layers, which distribute learning among sub-networks to model conflicts and nuances in driving policies effectively. The approach mitigates the common problem of modality collapse, which leads to overly smoothed trajectories.
Benchmarking on NuPlan Dataset: STR2 is benchmarked against several state-of-the-art methods using the comprehensive NuPlan dataset. The results demonstrate superior generalization capabilities, owing partly to the model's scalable design and autoregressive training methodology. The authors emphasize the need for large-scale, high-quality data, selecting the NuPlan dataset for its diversity and comprehensiveness.
Performance Under Closed-Loop Simulations: STR2 exhibits improved results across diverse simulation scenarios when compared to existing methods. The paper highlights STR2's consistent performance in maintaining accuracy as both data and model size increase, demonstrating robust scalability.
Scaling Laws and Model Size Impact: Comprehensive experiments show that increasing dataset size and model parameters enhances generalization performance. STR2's architecture allows efficient scaling, drawing parallels to successes witnessed in LLMs.
Open-Loop vs. Closed-Loop Performance: The paper provides a detailed exploration of STR2's performance across open-loop and closed-loop simulations. While open-loop metrics gauge fitting capabilities, closed-loop scenarios, especially with reactive agents, test its real-world applicability in dynamic, interaction-rich driving environments.

Implications

The practical implications of STR2 are significant for the field of autonomous driving. The scalable architecture allows for improved generalization capabilities, which are crucial in real-world applications where unexpected or few-shot scenarios frequently arise. The reliance on self-supervised learning without complex reward engineering makes the approach more adaptable and potentially more efficient to implement at scale.

Theoretically, this work adds to the growing body of research supporting scaled models and mixture-of-experts architectures in various domains. It also underscores how design simplicity, when combined with effective scaling strategies, can yield superior performance without the pitfalls of traditional complex designs.

Future Directions

The paper opens several avenues for future research. There are opportunities to further explore the interplay between model size, data diversity, and generalization capabilities. Investigations into different mixture-of-expert configurations could also yield better-bespoke solutions for distinct driving contexts. Lastly, exploration into inference time optimization will be critical for real-time applications of these architectures in edge devices.

In conclusion, the paper provides valuable insights into leveraging model scale and data richness to generalize motion planning in autonomous driving. It promises impactful applications in the industry, potentially accelerating the maturity and reliability of autonomous vehicle fleets.

PDF Markdown

Tweets

https://twitter.com/OWW/status/1851795657587151070