Learning to Play Trajectory Games Against Opponents with Unknown Objectives

Published 24 Nov 2022 in cs.RO | (2211.13779v3)

Abstract: Many autonomous agents, such as intelligent vehicles, are inherently required to interact with one another. Game theory provides a natural mathematical tool for robot motion planning in such interactive settings. However, tractable algorithms for such problems usually rely on a strong assumption, namely that the objectives of all players in the scene are known. To make such tools applicable for ego-centric planning with only local information, we propose an adaptive model-predictive game solver, which jointly infers other players' objectives online and computes a corresponding generalized Nash equilibrium (GNE) strategy. The adaptivity of our approach is enabled by a differentiable trajectory game solver whose gradient signal is used for maximum likelihood estimation (MLE) of opponents' objectives. This differentiability of our pipeline facilitates direct integration with other differentiable elements, such as neural networks (NNs). Furthermore, in contrast to existing solvers for cost inference in games, our method handles not only partial state observations but also general inequality constraints. In two simulated traffic scenarios, we find superior performance of our approach over both existing game-theoretic methods and non-game-theoretic model-predictive control (MPC) approaches. We also demonstrate our approach's real-time planning capabilities and robustness in two hardware experiments.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (20)

View on Semantic Scholar

Summary

The paper's main contribution is a differentiable trajectory game solver that infers opponents' objectives using a maximum likelihood approach.
It integrates forward and inverse game processes through an MCP framework to robustly address generalized Nash equilibrium challenges.
Experiments in 2-player and 7-player scenarios demonstrate real-time adaptability with improved safety and efficiency over traditional MPC methods.

Overview of "Learning to Play Trajectory Games Against Opponents with Unknown Objectives"

This paper introduces a novel approach for adaptive model-predictive game play (MPGP) that addresses the challenge of strategic decision-making in scenarios where the objectives of other agents are unknown. The context of this research is primarily in autonomous systems, such as intelligent vehicles, which must navigate dynamically in environments where interaction with other agents is unavoidable. The critical contribution is in the development of a differentiable trajectory game solver that allows for online inference of opponents' objectives, enabling agents to adapt their strategies in real time.

Problem Formulation

The authors consider an $N$ -player general-sum trajectory game formulated as a sequence of coupled trajectory optimization problems, where the state and input spaces are continuous. In this setting, each agent's objective function is unknown, adding complexity to the traditional noncooperative game framework. Traditional game-theoretic motion planning assumes knowledge of all players' objectives, and the authors address this limitation by facilitating online inference through a maximum likelihood estimation (MLE) framework.

The paper deals with general constraints, including both private and shared constraints among agents. The shared constraints, for example, involve interactions such as collision avoidance, which require cooperation between agents to be resolved effectively.

Methodology

The adaptive MPGP method proposed integrates both forward and inverse game processes. Forward games generally involve finding equilibrium solutions given known objectives, whereas inverse games deal with estimating objectives based on observed strategies. By rendering the trajectory game solver differentiable, the authors allow the gradients of the inferred objectives to propagate through their pipeline. The method is realized using a complementarity problem (MCP) framework to solve the generalized Nash equilibrium problems (GNEP) presented by the coupled player dynamics and interactions.

Differentiable Solver: The differentiability of the game solver is particularly innovative, enabling integration with other differentiable modules such as neural networks. This property allows the proposed method to go beyond simple inference of objectives and potentially work in concert with learned components, such as vision-based affordances or policy networks.

Practical Implications: This method is proposed to outperform traditional MPC strategies and even some game-theoretic approaches that do not account for adaptive inference. The strategy boasts real-time robustness and conditional safety via adherence to shared constraints, even in dense interaction scenarios.

Experimental Evaluation

The authors validate their claims with extensive experimental evaluations, both in simulation and on hardware. The scenarios include a 2-player tracking task and a more complex 7-player ramp-merging scenario, wherein vehicles need to autonomously negotiate entry onto a busy roadway. These experiments illustrate that the proposed method can effectively infer and adapt to opponents' goals, leading to safer and more efficient interactions compared to traditional methods.

Comparison with Baselines: The paper benchmarks its adaptive MPGP approach against a non-game-theoretic MPC baseline and a constrained solver without inequalities, clearly demonstrating superior performance in terms of interaction efficiency, collision rates, and overall robustness.

Real-time Capability: A critical aspect validated in experiments is the real-time applicability, as the model computation is efficient enough to deploy on existing robotic hardware like mobile ground robots in fluid environments involving human and robotic agents.

Implications and Future Directions

The paper opens several avenues for further exploration. Primarily, the differentiable nature of the solver invites integration with advanced ML techniques, expanding capabilities to more complex interaction models. Future work could explore robust handling of multi-modal uncertainty in opponent behavior and extend the framework to infer parameters across broader dimensions, including those embedded in the constraints themselves.

Furthermore, utilizing the predictive model within vehicle datasets or extending it to other domains such as human-robot interaction could highlight additional performance dimensions and scalability constraints. The potential synergy with end-to-end learning models could significantly enhance interaction-aware neural architectures, thereby positioning this research in strategic control of autonomous systems.

In summary, the research provides a technically intricate yet practically significant advancement in trajectory game theory, delivering an adaptable solution for autonomous planning tasks amidst uncertainty about other agents' intentions. As this field grows, such adaptive models could become pivotal in advancing the safety and efficiency of autonomous systems in mixed-traffic or densely populated environments.

Markdown Report Issue