RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios (2312.13303v2)
Abstract: Simulation plays a crucial role in the development of autonomous vehicles (AVs) due to the potential risks associated with real-world testing. Although significant progress has been made in the visual aspects of simulators, generating complex behavior among agents remains a formidable challenge. It is not only imperative to ensure realism in the scenarios generated but also essential to incorporate preferences and conditions to facilitate controllable generation for AV training and evaluation. Traditional methods, mainly relying on memorizing the distribution of training datasets, often fall short in generating unseen scenarios. Inspired by the success of retrieval augmented generation in LLMs, we present RealGen, a novel retrieval-based in-context learning framework for traffic scenario generation. RealGen synthesizes new scenarios by combining behaviors from multiple retrieved examples in a gradient-free way, which may originate from templates or tagged scenarios. This in-context learning framework endows versatile generative capabilities, including the ability to edit scenarios, compose various behaviors, and produce critical scenarios. Evaluations show that RealGen offers considerable flexibility and controllability, marking a new direction in the field of controllable traffic scenario generation. Check our project website for more information: https://realgen.github.io.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Retrieval-augmented diffusion models. Advances in Neural Information Processing Systems, 35:15309–15324, 2022.
- Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR, 2022.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
- Retrieval-guided dialogue response generation via a matching-to-generation framework. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1866–1875, 2019.
- Advdo: Realistic adversarial attacks for trajectory prediction. In European Conference on Computer Vision, pages 36–52. Springer, 2022.
- Reading wikipedia to answer open-domain questions. arXiv preprint arXiv:1704.00051, 2017.
- Traj-mae: Masked autoencoders for trajectory prediction. arXiv preprint arXiv:2303.06697, 2023.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- Re-imagen: Retrieval-augmented text-to-image generator. arXiv preprint arXiv:2209.14491, 2022.
- Forecast-mae: Self-supervised pre-training for motion forecasting with masked autoencoders. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8679–8689, 2023.
- Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Learning to collide: An adaptive safety-critical scenarios generating method. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020.
- Multimodal safety-critical scenarios generation for decision-making algorithms evaluation. IEEE Robotics and Automation Letters, 6(2):1551–1558, 2021a.
- Semantically adversarial driving scenario generation with explicit knowledge integration. arXiv preprint arXiv:2106.04066, 2021b.
- Trafficgen: Learning to generate diverse and realistic traffic scenarios. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 3567–3575. IEEE, 2023.
- Interpolating between optimal transport and mmd using sinkhorn divergences. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2681–2690, 2019.
- Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728, 2018.
- A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773, 2012.
- Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research. arXiv preprint arXiv:2310.08710, 2023.
- Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 297–304. JMLR Workshop and Conference Proceedings, 2010.
- Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR, 2020.
- King: Generating safety-critical driving scenarios for robust imitation via kinematics gradients. In European Conference on Computer Vision, pages 335–352. Springer, 2022.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149–5169, 2021.
- Model-based imitation learning for urban driving. Advances in Neural Information Processing Systems, 35:20703–20716, 2022.
- A review on recent research in information retrieval. Procedia Computer Science, 201:777–782, 2022.
- trajdata: A unified interface to multiple human trajectory datasets. arXiv preprint arXiv:2307.13924, 2023.
- Retrieval-augmented controllable review generation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2284–2295, 2020.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
- Scenarionet: Open-source platform for large-scale traffic scenario simulation and modeling. arXiv preprint arXiv:2306.12241, 2023.
- Relational memory-augmented language models. Transactions of the Association for Computational Linguistics, 10:555–572, 2022.
- NHTSA. Nhtsa crash viewer, 2023.
- Unsupervised learning of visual representations by solving jigsaw puzzles. In European conference on computer vision, pages 69–84. Springer, 2016.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Automatic differentiation in pytorch. In NIPS 2017 Workshop on Autodiff, 2017.
- Information retrieval techniques and applications. International Journal of Computer Networks and Communications Security, 3(9):373–377, 2015.
- Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015.
- Motion transformer with global intention localization and local movement refinement. Advances in Neural Information Processing Systems, 35:6531–6543, 2022.
- Self-supervised learning methods and applications in medical imaging analysis: A survey. PeerJ Computer Science, 8:e1045, 2022.
- Trafficsim: Learning to simulate realistic multi-agent behaviors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10400–10409, 2021.
- Scenegen: Learning to generate realistic traffic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 892–901, 2021.
- Language conditioned traffic generation. arXiv preprint arXiv:2307.07947, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Cédric Villani et al. Optimal transport: old and new. Springer, 2009.
- Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pages 1096–1103, 2008.
- Advsim: Generating safety-critical scenarios for self-driving vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9909–9918, 2021.
- Retrieval-based controllable molecule generation. arXiv preprint arXiv:2208.11126, 2022.
- Masked trajectory models for prediction, representation, and control. arXiv preprint arXiv:2305.02968, 2023.
- Megatron-cntrl: Controllable story generation with external knowledge using large-scale language models. arXiv preprint arXiv:2010.00840, 2020.
- Rmp: A random mask pretrain framework for motion prediction. arXiv preprint arXiv:2309.08989, 2023a.
- The dawn of lmms: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:2309.17421, 2023b.
- A survey on masked autoencoder for self-supervised learning in vision and beyond. arXiv preprint arXiv:2208.00173, 2022.
- Greaselm: Graph reasoning enhanced language models. In International conference on learning representations, 2021.
- Language-guided traffic simulation via scene-level diffusion. arXiv preprint arXiv:2306.06344, 2023a.
- Guided conditional diffusion for controllable traffic simulation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 3560–3566. IEEE, 2023b.
- Wenhao Ding (43 papers)
- Yulong Cao (26 papers)
- Ding Zhao (172 papers)
- Chaowei Xiao (110 papers)
- Marco Pavone (314 papers)