AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting (2103.14023v3)

Published 25 Mar 2021 in cs.AI, cs.CV, cs.LG, cs.MA, and cs.RO

Abstract: Predicting accurate future trajectories of multiple agents is essential for autonomous systems, but is challenging due to the complex agent interaction and the uncertainty in each agent's future behavior. Forecasting multi-agent trajectories requires modeling two key dimensions: (1) time dimension, where we model the influence of past agent states over future states; (2) social dimension, where we model how the state of each agent affects others. Most prior methods model these two dimensions separately, e.g., first using a temporal model to summarize features over time for each agent independently and then modeling the interaction of the summarized features with a social model. This approach is suboptimal since independent feature encoding over either the time or social dimension can result in a loss of information. Instead, we would prefer a method that allows an agent's state at one time to directly affect another agent's state at a future time. To this end, we propose a new Transformer, AgentFormer, that jointly models the time and social dimensions. The model leverages a sequence representation of multi-agent trajectories by flattening trajectory features across time and agents. Since standard attention operations disregard the agent identity of each element in the sequence, AgentFormer uses a novel agent-aware attention mechanism that preserves agent identities by attending to elements of the same agent differently than elements of other agents. Based on AgentFormer, we propose a stochastic multi-agent trajectory prediction model that can attend to features of any agent at any previous timestep when inferring an agent's future position. The latent intent of all agents is also jointly modeled, allowing the stochasticity in one agent's behavior to affect other agents. Our method substantially improves the state of the art on well-established pedestrian and autonomous driving datasets.

Authors (4)

Ye Yuan (274 papers)
Xinshuo Weng (42 papers)
Yanglan Ou (6 papers)
Kris Kitani (96 papers)

Citations (374)

View on Semantic Scholar

Summary

AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting

The paper "AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting" addresses the complex task of trajectory prediction for multiple agents, focusing on enhancing the performance of autonomous systems such as self-driving vehicles. Multi-agent trajectory forecasting is inherently challenging due to the intricate interactions between agents and the associated uncertainties in predicting individual trajectories. The authors propose a novel Transformer-based model, termed AgentFormer, which aims to concurrently model the temporal and social dimensions of agent trajectories. This unified approach contrasts with prior methods that typically treat these dimensions separately.

Technical Contributions

Agent-Aware Attention: The central innovation in AgentFormer is the agent-aware attention mechanism within the Transformer architecture. This mechanism enables the model to maintain distinctions between features of the same agent and features of other agents, preserving agent identities throughout the sequence modeling process. Traditional attention mechanisms in Transformers are agnostic to the identity of agents, potentially leading to information loss when applied to multi-agent scenarios. The agent-aware attention modifies how attention weights are computed, taking into account whether features belong to the same agent or different agents.
Unified Socio-Temporal Modeling: By representing multi-agent trajectories in a flattened sequence of agent-timestep pairs, AgentFormer performs joint socio-temporal modeling. This design choice allows the model to consider an agent's state influence at one time and its impact on another agent's future state directly, eschewing intermediate summary steps that could obfuscate dependencies.
Stochastic Forecasting Framework: Incorporating a Conditional Variational Autoencoder (CVAE) structure, AgentFormer models the latent intents of agents and infers future trajectory distributions conditioned on these intents. Importantly, the latent intents are jointly modeled, allowing for the stochasticity in one agent's behavior to potentially affect others, thereby informing a more socially coherent prediction of trajectories.
Empirical Evaluation: The proposed method is evaluated on benchmarks such as the ETH/UCY pedestrian datasets and the nuScenes autonomous driving dataset. The results demonstrate that AgentFormer achieves substantial improvements over state-of-the-art methods, particularly in final displacement error, which underscores its capability for long-horizon prediction efficacy.

Implications and Future Directions

The introduction of AgentFormer provides a notable advancement in the field of trajectory forecasting, particularly for autonomous systems operating with high demands for safety and precision. By effectively capturing the temporal and social dynamics of multi-agent interactions, AgentFormer offers a tool that can potentially reduce the computational complexity and enhance the reliability of autonomous navigation systems.

Future research could explore the adaptation of AgentFormer to incorporate additional modalities of sensor data, like LiDAR and camera inputs, to improve its robustness across varied environments. Furthermore, additional paper into the interpretability of attention mechanisms used within AgentFormer could yield insights into the decision-making processes of autonomous systems, enabling better safety assurances and trustworthiness in practical applications.

Overall, AgentFormer represents a sophisticated approach to multi-agent trajectory forecasting that leverages the strengths of Transformer architectures. It opens pathways for more integrated and holistic modeling strategies that can facilitate the development of intelligent, cooperative, and autonomous agents within complex and dynamic environments.

PDF Markdown

Related Papers

Find Related Papers