Trajeglish: Traffic Modeling as Next-Token Prediction (2312.04535v2)

Published 7 Dec 2023 in cs.LG and cs.RO

Abstract: A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs. In pursuit of this functionality, we apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios. Using a simple data-driven tokenization scheme, we discretize trajectories to centimeter-level resolution using a small vocabulary. We then model the multi-agent sequence of discrete motion tokens with a GPT-like encoder-decoder that is autoregressive in time and takes into account intra-timestep interaction between agents. Scenarios sampled from our model exhibit state-of-the-art realism; our model tops the Waymo Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%. We ablate our modeling choices in full autonomy and partial autonomy settings, and show that the representations learned by our model can quickly be adapted to improve performance on nuScenes. We additionally evaluate the scalability of our model with respect to parameter count and dataset size, and use density estimates from our model to quantify the saliency of context length and intra-timestep interaction for the traffic modeling task.

References (34)

Citations (8)

View on Semantic Scholar

Summary

The paper introduces an autoregressive model using k-disks tokenization that discretizes trajectories with 1 cm accuracy.
It employs a GPT-like transformer architecture to capture intra-timestep interactions among vehicles, pedestrians, and cyclists.
Experiments demonstrate state-of-the-art improvements of 3.3% and 9.9% on realism and interaction metrics, confirming its scalability and adaptability.

Trajeglish: Learning the Language of Driving Scenarios

Introduction

The paper introduces "Trajeglish," a novel autoregressive model designed for simulating dynamic driving scenarios by imitating the interactions among various road users, including vehicles, pedestrians, and cyclists. The model leverages the principles from discrete sequence modeling, akin to those used in natural language processing, to produce highly realistic traffic simulations. This research aims to bridge a critical gap in self-driving technology by enhancing simulation environments for autonomous vehicles (AVs).

Methodology

Data-Driven Tokenization: Trajeglish employs a data-driven tokenization scheme termed "k-disks" to discretize driving trajectories down to centimeter-level accuracy. This scheme utilizes a small vocabulary size of 384 tokens, enabling intricate modeling of the motion data from the Waymo Open Dataset (WOMD).

Transformer-Based Architecture: The model features a GPT-like transformer architecture that functions autoregressively in time. It also incorporates a mechanism to account for intra-timestep interaction between agents, thereby capturing the nuances in how road users influence each other's movement within a single timestep.

Key Contributions and Results

The main contributions of the paper are:

Tokenization Method: The k-disks approach for discretizing trajectory data achieves an expected discretization error of merely 1 cm, providing a granular and accurate representation of motion data.
Transformer-Based Model: The proposed model conditions on map information and initial states of agents to produce a distribution over future actions. This enables dynamic interaction modeling that is particularly effective for simulating driving environments.
State-of-the-Art Performance: When evaluated on the Waymo Sim Agents Benchmark, Trajeglish surpasses previous state-of-the-art models by 3.3% on the realism meta metric and by 9.9% on the interaction metric, demonstrating its ability to generate more realistic and interactive traffic scenarios.

Experimental Validation

The authors validate their model on multiple fronts:

Partial and Full Control Settings: Detailed experiments demonstrate the robustness of Trajeglish in both full and partial control scenarios. The model effectively handles interactions among multiple agents when some are controlled by the model and others are on replay.
Scalability: The model's scalability is tested with respect to parameter count and dataset size. Results show that larger models and datasets significantly enhance performance, indicating that Trajeglish benefits substantially from more extensive training data.
Transferability: The model's ability to generalize across different datasets is tested using the nuScenes dataset. Fine-tuning Trajeglish on nuScenes scenarios yields lower negative log-likelihood (NLL) compared to training a model from scratch, underscoring its adaptability.

Implications and Future Directions

The practical implications of Trajeglish are noteworthy:

Improved Simulation Realism: By modeling interactions more accurately, Trajeglish can significantly enhance the quality of traffic simulations used for testing and developing self-driving systems.
Safety Enhancements: Enhanced simulation environments can potentially contribute to safer AV deployment by allowing for more rigorous and realistic testing.

From a theoretical perspective, Trajeglish pushes the boundary in modeling dynamic, multi-agent environments using discrete sequence techniques. The success of the k-disks tokenization approach also opens avenues for similar methods to be applied in other domains requiring fine-grained spatial and temporal modeling.

Future work may delve into further optimizing intra-timestep interactions, exploring larger datasets to fully leverage the model's scalability, and extending the model's application to more complex driving scenarios involving edge cases and rare events.

Conclusion

Trajeglish represents a sophisticated step forward in the domain of self-driving simulations. By combining granular tokenization with a robust autoregressive model, it achieves significant improvements in realism and interaction modeling. This work sets the stage for more advanced and safer simulation environments, critical for the development and deployment of autonomous vehicles.

PDF Markdown

Related Papers

Tweets

https://twitter.com/1010810057464766464/status/1733790570211127352

https://twitter.com/22146921/status/1733614938197803141