Trajeglish: Traffic Modeling as Next-Token Prediction (2312.04535v2)
Abstract: A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs. In pursuit of this functionality, we apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios. Using a simple data-driven tokenization scheme, we discretize trajectories to centimeter-level resolution using a small vocabulary. We then model the multi-agent sequence of discrete motion tokens with a GPT-like encoder-decoder that is autoregressive in time and takes into account intra-timestep interaction between agents. Scenarios sampled from our model exhibit state-of-the-art realism; our model tops the Waymo Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%. We ablate our modeling choices in full autonomy and partial autonomy settings, and show that the representations learned by our model can quickly be adapted to improve performance on nuScenes. We additionally evaluate the scalability of our model with respect to parameter count and dataset size, and use density estimates from our model to quantify the saliency of context length and intra-timestep interaction for the traffic modeling task.
- K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, pp. 1027–1035, USA, 2007. Society for Industrial and Applied Mathematics. ISBN 9780898716245.
- nuscenes: A multimodal dataset for autonomous driving. CoRR, abs/1903.11027, 2019. URL http://arxiv.org/abs/1903.11027.
- Robert L. Cook. Stochastic sampling in computer graphics. ACM Trans. Graph., 5(1):51–72, jan 1986. ISSN 0730-0301. doi: 10.1145/7529.8927. URL https://doi.org/10.1145/7529.8927.
- Flashattention: Fast and memory-efficient exact attention with io-awareness, 2022.
- Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9710–9719, October 2021.
- Vectornet: Encoding hd maps and agent dynamics from vectorized representation, 2020.
- The curious case of neural text degeneration. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rygGQyrFvH.
- Gaia-1: A generative world model for autonomous driving, 2023.
- Lora: Low-rank adaptation of large language models. CoRR, abs/2106.09685, 2021. URL https://arxiv.org/abs/2106.09685.
- Symphony: Learning realistic and diverse agents for autonomous driving simulation, 2022.
- Offline reinforcement learning as one big sequence modeling problem. In Advances in Neural Information Processing Systems, 2021.
- Motiondiffuser: Controllable multi-agent motion prediction using diffusion, 2023.
- Scaling laws for neural language models. CoRR, abs/2001.08361, 2020. URL https://arxiv.org/abs/2001.08361.
- Fixing weight decay regularization in adam. CoRR, abs/1711.05101, 2017. URL http://arxiv.org/abs/1711.05101.
- Imitation is not enough: Robustifying imitation with reinforcement learning for challenging driving scenarios, 2023.
- The waymo open sim agents challenge, 2023.
- Wayformer: Motion forecasting via simple and efficient attention networks, 2022.
- Scene transformer: A unified multi-task model for behavior prediction and planning. CoRR, abs/2106.08417, 2021. URL https://arxiv.org/abs/2106.08417.
- Jonah Philion. Fastdraw: Addressing the long tail of lane detection by adapting a sequential prediction network. CoRR, abs/1905.04354, 2019. URL http://arxiv.org/abs/1905.04354.
- Using the output embedding to improve language models, 2017.
- Language models are unsupervised multitask learners. 2019.
- Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR, abs/1910.10683, 2019. URL http://arxiv.org/abs/1910.10683.
- Sequence level training with recurrent neural networks. In Yoshua Bengio and Yann LeCun (eds.), 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016. URL http://arxiv.org/abs/1511.06732.
- Generating useful accident-prone driving scenarios via a learned traffic prior. CoRR, abs/2112.05077, 2021. URL https://arxiv.org/abs/2112.05077.
- Efficient reductions for imitation learning. In Yee Whye Teh and Mike Titterington (eds.), Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pp. 661–668, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR. URL https://proceedings.mlr.press/v9/ross10a.html.
- Motionlm: Multi-agent motion forecasting as language modeling, 2023.
- Motion transformer with global intention localization and local movement refinement. Advances in Neural Information Processing Systems, 2022.
- Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying. arXiv preprint arXiv:2306.17770, 2023.
- Trafficsim: Learning to simulate realistic multi-agent behaviors, 2021.
- Wavenet: A generative model for raw audio. CoRR, abs/1609.03499, 2016. URL http://arxiv.org/abs/1609.03499.
- Multipath++: Efficient information fusion and trajectory aggregation for behavior prediction. CoRR, abs/2111.14973, 2021. URL https://arxiv.org/abs/2111.14973.
- Attention is all you need. CoRR, abs/1706.03762, 2017. URL http://arxiv.org/abs/1706.03762.
- Guided conditional diffusion for controllable traffic simulation, 2022.
- Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. CoRR, abs/1506.06724, 2015. URL http://arxiv.org/abs/1506.06724.