Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios (2312.13303v2)

Published 19 Dec 2023 in cs.LG and cs.AI

Abstract: Simulation plays a crucial role in the development of autonomous vehicles (AVs) due to the potential risks associated with real-world testing. Although significant progress has been made in the visual aspects of simulators, generating complex behavior among agents remains a formidable challenge. It is not only imperative to ensure realism in the scenarios generated but also essential to incorporate preferences and conditions to facilitate controllable generation for AV training and evaluation. Traditional methods, mainly relying on memorizing the distribution of training datasets, often fall short in generating unseen scenarios. Inspired by the success of retrieval augmented generation in LLMs, we present RealGen, a novel retrieval-based in-context learning framework for traffic scenario generation. RealGen synthesizes new scenarios by combining behaviors from multiple retrieved examples in a gradient-free way, which may originate from templates or tagged scenarios. This in-context learning framework endows versatile generative capabilities, including the ability to edit scenarios, compose various behaviors, and produce critical scenarios. Evaluations show that RealGen offers considerable flexibility and controllability, marking a new direction in the field of controllable traffic scenario generation. Check our project website for more information: https://realgen.github.io.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  2. Retrieval-augmented diffusion models. Advances in Neural Information Processing Systems, 35:15309–15324, 2022.
  3. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR, 2022.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  5. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
  6. Retrieval-guided dialogue response generation via a matching-to-generation framework. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1866–1875, 2019.
  7. Advdo: Realistic adversarial attacks for trajectory prediction. In European Conference on Computer Vision, pages 36–52. Springer, 2022.
  8. Reading wikipedia to answer open-domain questions. arXiv preprint arXiv:1704.00051, 2017.
  9. Traj-mae: Masked autoencoders for trajectory prediction. arXiv preprint arXiv:2303.06697, 2023.
  10. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  11. Re-imagen: Retrieval-augmented text-to-image generator. arXiv preprint arXiv:2209.14491, 2022.
  12. Forecast-mae: Self-supervised pre-training for motion forecasting with masked autoencoders. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8679–8689, 2023.
  13. Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
  14. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  15. Learning to collide: An adaptive safety-critical scenarios generating method. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020.
  16. Multimodal safety-critical scenarios generation for decision-making algorithms evaluation. IEEE Robotics and Automation Letters, 6(2):1551–1558, 2021a.
  17. Semantically adversarial driving scenario generation with explicit knowledge integration. arXiv preprint arXiv:2106.04066, 2021b.
  18. Trafficgen: Learning to generate diverse and realistic traffic scenarios. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 3567–3575. IEEE, 2023.
  19. Interpolating between optimal transport and mmd using sinkhorn divergences. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2681–2690, 2019.
  20. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728, 2018.
  21. A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773, 2012.
  22. Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research. arXiv preprint arXiv:2310.08710, 2023.
  23. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 297–304. JMLR Workshop and Conference Proceedings, 2010.
  24. Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR, 2020.
  25. King: Generating safety-critical driving scenarios for robust imitation via kinematics gradients. In European Conference on Computer Vision, pages 335–352. Springer, 2022.
  26. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  27. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  28. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  29. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  30. Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149–5169, 2021.
  31. Model-based imitation learning for urban driving. Advances in Neural Information Processing Systems, 35:20703–20716, 2022.
  32. A review on recent research in information retrieval. Procedia Computer Science, 201:777–782, 2022.
  33. trajdata: A unified interface to multiple human trajectory datasets. arXiv preprint arXiv:2307.13924, 2023.
  34. Retrieval-augmented controllable review generation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2284–2295, 2020.
  35. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  36. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  37. Scenarionet: Open-source platform for large-scale traffic scenario simulation and modeling. arXiv preprint arXiv:2306.12241, 2023.
  38. Relational memory-augmented language models. Transactions of the Association for Computational Linguistics, 10:555–572, 2022.
  39. NHTSA. Nhtsa crash viewer, 2023.
  40. Unsupervised learning of visual representations by solving jigsaw puzzles. In European conference on computer vision, pages 69–84. Springer, 2016.
  41. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  42. Automatic differentiation in pytorch. In NIPS 2017 Workshop on Autodiff, 2017.
  43. Information retrieval techniques and applications. International Journal of Computer Networks and Communications Security, 3(9):373–377, 2015.
  44. Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015.
  45. Motion transformer with global intention localization and local movement refinement. Advances in Neural Information Processing Systems, 35:6531–6543, 2022.
  46. Self-supervised learning methods and applications in medical imaging analysis: A survey. PeerJ Computer Science, 8:e1045, 2022.
  47. Trafficsim: Learning to simulate realistic multi-agent behaviors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10400–10409, 2021.
  48. Scenegen: Learning to generate realistic traffic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 892–901, 2021.
  49. Language conditioned traffic generation. arXiv preprint arXiv:2307.07947, 2023.
  50. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  51. Cédric Villani et al. Optimal transport: old and new. Springer, 2009.
  52. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pages 1096–1103, 2008.
  53. Advsim: Generating safety-critical scenarios for self-driving vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9909–9918, 2021.
  54. Retrieval-based controllable molecule generation. arXiv preprint arXiv:2208.11126, 2022.
  55. Masked trajectory models for prediction, representation, and control. arXiv preprint arXiv:2305.02968, 2023.
  56. Megatron-cntrl: Controllable story generation with external knowledge using large-scale language models. arXiv preprint arXiv:2010.00840, 2020.
  57. Rmp: A random mask pretrain framework for motion prediction. arXiv preprint arXiv:2309.08989, 2023a.
  58. The dawn of lmms: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:2309.17421, 2023b.
  59. A survey on masked autoencoder for self-supervised learning in vision and beyond. arXiv preprint arXiv:2208.00173, 2022.
  60. Greaselm: Graph reasoning enhanced language models. In International conference on learning representations, 2021.
  61. Language-guided traffic simulation via scene-level diffusion. arXiv preprint arXiv:2306.06344, 2023a.
  62. Guided conditional diffusion for controllable traffic simulation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 3560–3566. IEEE, 2023b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Wenhao Ding (43 papers)
  2. Yulong Cao (26 papers)
  3. Ding Zhao (172 papers)
  4. Chaowei Xiao (110 papers)
  5. Marco Pavone (314 papers)
Citations (13)

Summary

  • The paper presents RealGen, a retrieval-augmented generation framework that leverages external databases and in-context learning for controllable traffic simulation.
  • It demonstrates notable improvements in reconstruction metrics (mADE 0.31 and mFDE 0.53) and diverse scenario generation compared to traditional models.
  • The approach offers practical benefits for autonomous vehicle simulations, enabling the generation of novel edge-case scenarios for enhanced training robustness.

An Overview of RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios

The paper "RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios" presents a novel approach to generating traffic scenarios for autonomous vehicle (AV) simulation. The proposed methodology leverages a retrieval-augmented generation framework, RealGen, which aims to address the limitations of traditional data-driven simulation methods. These conventional methods often rely on memorizing training data distributions and thus struggle to generate unseen scenarios, a critical need for robust AV training and evaluation.

Concept and Framework

RealGen integrates retrieval-augmented generation (RAG) with an in-context learning framework. The traditional methods retain all knowledge within their model parameters, whereas RealGen enhances the scenario generation process by querying external databases for relevant information. This approach is inspired by the success of RAG in LLMs. RealGen combines behaviors from multiple retrieved scenarios in a gradient-free manner, allowing for flexibility in generating realistic and controllable traffic scenarios.

The architecture of RealGen includes an autoencoder model to extract latent scenario embeddings and a novel combinatory component termed "Combiner." The Combiner synthesizes a new behavior embedding by blending features from multiple retrieved scenarios. The system is designed not only to retrieve and replicate scenarios closely resembling real-world situations but also to generate novel scenarios by reassembling elements of known scenarios in nuanced combinations.

Evaluation Metrics and Results

The effectiveness of RealGen was validated against several benchmarks on the nuScenes dataset using trajdata. Key performance metrics included mean average displacement error (mADE), mean final displacement error (mFDE), velocity and heading consistency with real-world data, collision rates, and off-road rates. In recon-based generation (i.e., using the original target scenario for reconstruction), RealGen-AE reported mADE and mFDE of 0.31 and 0.53 respectively, showcasing an effective reconstruction capability compared to traditional autoencoders and masked autoencoders.

In retrieval-based settings (i.e., scenarios generated from retrieved scenario inputs), RealGen achieved a significant improvement over the AE-KNN baseline with a noticeable reduction in mADE and mFDE, emphasizing its robustness in leveraging retrieved data for generating high-quality scenarios. The research demonstrates RealGen's capability not only in accurate scenario replication but also in generating diverse and contextually relevant novel scenarios, such as crash scenarios and scenarios characterized by specific behavioral tags.

Implications and Future Directions

The introduction of RealGen has potential implications for both practical applications and theoretical developments in AI simulation. Practically, RealGen provides a scalable and adaptable framework for generating traffic scenarios in AV simulators, capable of including rare and critical edge cases that are crucial for comprehensive AV training. The model's ability to generate new, potentially unseen scenarios offers a unique opportunity to enhance robustness and safety evaluations for AV systems.

Theoretically, this research highlights the potential of retrieval-augmented methodologies in expanding the generative capabilities beyond traditional generative models. This work also sets a precedent for future exploration into more sophisticated and nuanced scenario representations, possibly involving deeper interactions between agents and their environments.

In conclusion, RealGen marks a significant stride towards addressing the challenges of scenario generation in AV simulation, and it opens new pathways for retrieval-based frameworks in other domains requiring scene-level generation and evaluation. Further advancements may focus on extending the scope of behavior encoding to encompass richer and intertwined agent-environment interactions, thereby broadening the applicability and realism of scenario generation in autonomous systems research.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com