Blending Data-Driven Priors in Dynamic Games (2402.14174v3)
Abstract: As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, humans tend to deviate from the decisions prescribed by these models, and their behavior is better approximated under a noisy-rational paradigm. In this work, we investigate a principled methodology to blend a data-driven reference policy with an optimization-based game-theoretic policy. We formulate KLGame, an algorithm for solving non-cooperative dynamic game with Kullback-Leibler (KL) regularization with respect to a general, stochastic, and possibly multi-modal reference policy. Our method incorporates, for each decision maker, a tunable parameter that permits modulation between task-driven and data-driven behaviors. We propose an efficient algorithm for computing multi-modal approximate feedback Nash equilibrium strategies of KLGame in real time. Through a series of simulated and real-world autonomous driving scenarios, we demonstrate that KLGame policies can more effectively incorporate guidance from the reference policy and account for noisily-rational human behaviors versus non-regularized baselines. Website with additional information, videos, and code: https://kl-games.github.io/.
- J. L. V. Espinoza, A. Liniger, W. Schwarting, D. Rus, and L. Van Gool, “Deep interactive motion prediction and planning: Playing games with motion prediction models,” in L4DC. PMLR, 2022, pp. 1006–1019.
- L. Lindemann, M. Cleaveland, G. Shim, and G. J. Pappas, “Safe planning in dynamic environments using conformal prediction,” IEEE Robotics and Automation Letters, 2023.
- N. Mehr, M. Wang, M. Bhatt, and M. Schwager, “Maximum-entropy multi-agent dynamic games: Forward and inverse solutions,” IEEE Transactions on Robotics, 2023.
- D. Fridovich-Keil, E. Ratner, L. Peters, A. D. Dragan, and C. J. Tomlin, “Efficient iterative linear-quadratic approximations for nonlinear multi-player general-sum differential games,” in 2020 IEEE international conference on robotics and automation (ICRA). IEEE, 2020, pp. 1475–1481.
- S. Shi, L. Jiang, D. Dai, and B. Schiele, “MTR++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying,” arXiv preprint arXiv:2306.17770, 2023.
- W. Schwarting, A. Pierson, J. Alonso-Mora, S. Karaman, and D. Rus, “Social behavior for autonomous vehicles,” PNAS, vol. 116, no. 50, pp. 24 972–24 978, 2019.
- M. Wang, Z. Wang, J. Talbot, J. C. Gerdes, and M. Schwager, “Game-theoretic planning for self-driving cars in multivehicle competitive scenarios,” IEEE T-RO, vol. 37, no. 4, pp. 1313–1325, 2021.
- H. Hu, D. Isele, S. Bae, and J. F. Fisac, “Active uncertainty reduction for safe and efficient interaction planning: A shielding-aware dual control approach,” The International Journal of Robotics Research, 2023.
- M. Sun, F. Baldini, P. Trautman, and T. Murphey, “Move beyond trajectories: Distribution space coupling for crowd navigation,” ser. Robotics: Science and Systems, 2021.
- Z. Williams, J. Chen, and N. Mehr, “Distributed potential ilqr: Scalable game-theoretic trajectory planning for multi-agent interactions,” arXiv preprint arXiv:2303.04842, 2023.
- H. Hu, K. Gatsis, M. Morari, and G. J. Pappas, “Non-cooperative distributed MPC with iterative learning,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 5225–5232, 2020.
- R. Spica, E. Cristofalo, Z. Wang, E. Montijano, and M. Schwager, “A real-time game theoretic planner for autonomous two-player drone racing,” IEEE Transactions on Robotics, vol. 36, no. 5, pp. 1389–1403, 2020.
- S. Musić and S. Hirche, “Haptic shared control for human-robot collaboration: a game-theoretical approach,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 10 216–10 222, 2020.
- P. Geiger and C.-N. Straehle, “Learning game-theoretic models of multiagent trajectories using implicit layers,” in AAAI, vol. 35, no. 6, 2021, pp. 4950–4958.
- S. Le Cleac’h, M. Schwager, Z. Manchester, et al., “ALGAMES: A fast solver for constrained dynamic games,” in RSS, 2020.
- X. Liu, L. Peters, and J. Alonso-Mora, “Learning to play trajectory games against opponents with unknown objectives,” IEEE Robotics and Automation Letters, 2023.
- S. Le Cleac’h, M. Schwager, and Z. Manchester, “LUCIDgames: Online unscented inverse dynamic games for adaptive trajectory prediction and planning,” IEEE RA-L, vol. 6, no. 3, pp. 5485–5492, 2021.
- J. F. Fisac, E. Bronstein, E. Stefansson, D. Sadigh, S. S. Sastry, and A. D. Dragan, “Hierarchical Game-Theoretic planning for autonomous vehicles,” in ICRA, May 2019, pp. 9590–9596.
- H. Hu and J. F. Fisac, “Active uncertainty reduction for human-robot interaction: An implicit dual control approach,” in Algorithmic Foundations of Robotics XV, 2022, pp. 385–401.
- W. Schwarting, A. Pierson, S. Karaman, and D. Rus, “Stochastic dynamic games in belief space,” IEEE T-RO, vol. 37, no. 6, pp. 2157–2172, 2021.
- F. Steinberger, R. Schroeter, and C. N. Watling, “From road distraction to safe driving: Evaluating the effects of boredom and gamification on driving behaviour, physiological arousal, and subjective experience,” Computers in Human Behavior, vol. 75, pp. 714–726, 2017.
- A. Talebpour, H. S. Mahmassani, and S. H. Hamdar, “Modeling lane-changing behavior in a connected environment: A game theory approach,” Transportation Research Procedia, vol. 7, pp. 420–440, 2015.
- H. Hu, Z. Zhang, K. Nakamura, A. Bajcsy, and J. F. Fisac, “Deception game: Closing the safety-learning loop in interactive robot autonomy,” in Conference on Robot Learning. PMLR, 2023, pp. 3830–3850.
- K. L. Young, S. Koppel, and J. L. Charlton, “Toward best practice in human machine interface design for older drivers: A review of current design guidelines,” Accident Analysis & Prevention, vol. 106, pp. 460–467, 2017.
- B. D. Ziebart, A. L. Maas, J. A. Bagnell, A. K. Dey, et al., “Maximum entropy inverse reinforcement learning.” in AAAI, vol. 8, 2008, pp. 1433–1438.
- M. Wulfmeier, D. Rao, D. Z. Wang, P. Ondruska, and I. Posner, “Large-scale cost function learning for path planning using deep inverse reinforcement learning,” The International Journal of Robotics Research, vol. 36, no. 10, pp. 1073–1087, 2017.
- B. Evens, M. Schuurmans, and P. Patrinos, “Learning mpc for interaction-aware autonomous driving: A game-theoretic approach,” in 2022 European Control Conference (ECC), 2022, pp. 34–39.
- T. Phan-Minh, F. Howington, T.-S. Chu, S. U. Lee, M. S. Tomov, N. Li, C. Dicle, S. Findler, F. Suarez-Ruiz, R. Beaudoin, et al., “Driving in real life with inverse reinforcement learning,” arXiv preprint arXiv:2206.03004, 2022.
- L. Peters, D. Fridovich-Keil, C. J. Tomlin, and Z. N. Sunberg, “Inference-based strategy alignment for general-sum differential games,” arXiv preprint arXiv:2002.04354, 2020.
- O. So, Z. Wang, and E. A. Theodorou, “Maximum entropy differential dynamic programming,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 3422–3428.
- O. So, P. Drews, T. Balch, V. Dimitrov, G. Rosman, and E. A. Theodorou, “MPOGames: Efficient multimodal partially observable dynamic games,” in ICRA. IEEE, 2023, pp. 3189–3196.
- M. L. Littman, A. R. Cassandra, and L. P. Kaelbling, “Learning policies for partially observable environments: Scaling up,” in Machine Learning Proceedings 1995. Elsevier, 1995, pp. 362–370.
- L. Peters, A. Bajcsy, C.-Y. Chiu, D. Fridovich-Keil, F. Laine, L. Ferranti, and J. Alonso-Mora, “Contingency games for multi-agent interaction,” arXiv preprint arXiv:2304.05483, 2023.
- H. Hu, K. Nakamura, K.-C. Hsu, N. E. Leonard, and J. F. Fisac, “Emergent coordination through game-induced nonlinear opinion dynamics,” in 2023 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023, pp. 8122–8129.
- L. Peters, D. Fridovich-Keil, L. Ferranti, C. Stachniss, J. Alonso-Mora, and F. Laine, “Learning mixed strategies in trajectory games,” in Proc. of Robotics: Science and Systems (RSS), 2022.
- J. Li, C.-Y. Chiu, L. Peters, S. Sojoudi, C. Tomlin, and D. Fridovich-Keil, “Cost inference for feedback dynamic games from noisy partial state observations and incomplete trajectories,” arXiv preprint arXiv:2301.01398, 2023.
- E. Todorov, “Optimality principles in sensorimotor control,” Nature neuroscience, vol. 7, no. 9, pp. 907–915, 2004.
- F. Bartumeus and S. A. Levin, “Fractal reorientation clocks: Linking animal behavior to statistical patterns of search,” Proceedings of the National Academy of Sciences, vol. 105, no. 49, pp. 19 072–19 077, 2008.
- S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y. Chai, B. Sapp, C. R. Qi, Y. Zhou, et al., “Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset,” in CVPR, 2021, pp. 9710–9719.
- J. Kim and I. Yang, “Hamilton-jacobi-bellman equations for maximum entropy optimal control,” arXiv preprint arXiv:2009.13097, 2020.
- E. A. Theodorou and E. Todorov, “Relative entropy and free energy dualities: Connections to path integral and kl control,” in 2012 ieee 51st ieee conference on decision and control (cdc). IEEE, 2012, pp. 1466–1473.
- T. Haarnoja, H. Tang, P. Abbeel, and S. Levine, “Reinforcement learning with deep energy-based policies,” in International conference on machine learning. PMLR, 2017, pp. 1352–1361.
- D. Garg, S. Chakraborty, C. Cundy, J. Song, and S. Ermon, “Iq-learn: Inverse soft-q learning for imitation,” Advances in Neural Information Processing Systems, vol. 34, pp. 4028–4039, 2021.
- E. Todorov, “Linearly-solvable markov decision problems,” Advances in neural information processing systems, vol. 19, 2006.
- ——, “Compositionality of optimal control laws,” Advances in neural information processing systems, vol. 22, 2009.
- P. Guan, M. Raginsky, and R. M. Willett, “Online markov decision processes with kullback–leibler control cost,” IEEE Transactions on Automatic Control, vol. 59, no. 6, pp. 1423–1438, 2014.
- K. Ito and K. Kashima, “Kullback–leibler control for discrete-time nonlinear systems on continuous spaces,” SICE Journal of Control, Measurement, and System Integration, vol. 15, no. 2, pp. 119–129, 2022.
- J. Ok, A. Proutiere, and D. Tranos, “Exploration in structured reinforcement learning,” Advances in Neural Information Processing Systems, vol. 31, 2018.
- R. Munos, M. Valko, D. Calandriello, M. G. Azar, M. Rowland, Z. D. Guo, Y. Tang, M. Geist, T. Mesnard, A. Michi, et al., “Nash learning from human feedback,” arXiv preprint arXiv:2312.00886, 2023.
- D. Bernardini and A. Bemporad, “Stabilizing model predictive control of stochastic constrained linear systems,” IEEE Trans. Autom. Control, vol. 57, no. 6, pp. 1468–1480, 2011.
- M. C. Campi, S. Garatti, and F. A. Ramponi, “A general scenario theory for nonconvex optimization and decision making,” IEEE Transactions on Automatic Control, vol. 63, no. 12, pp. 4067–4078, 2018.
- G. Schildbach and F. Borrelli, “Scenario model predictive control for lane change assistance on highways,” in IEEE Intelligent Vehicles Symposium (IV), 2015, pp. 611–616.
- Y. Chen, U. Rosolia, W. Ubellacker, N. Csomay-Shanklin, and A. D. Ames, “Interactive multi-modal motion planning with branch model predictive control,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5365–5372, 2022.
- H. Hu, K. Nakamura, and J. F. Fisac, “SHARP: Shielding-aware robust planning for safe and efficient human-robot interaction,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5591–5598, 2022.
- J. Li, C.-Y. Chiu, L. Peters, F. Palafox, M. Karabag, J. Alonso-Mora, S. Sojoudi, C. Tomlin, and D. Fridovich-Keil, “Scenario-game admm: A parallelized scenario-based solver for stochastic noncooperative games,” arXiv preprint arXiv:2304.01945, 2023.
- B. Eysenbach and S. Levine, “Maximum entropy RL (provably) solves some robust RL problems,” arXiv preprint arXiv:2103.06257, 2021.
- Z. Wu, L. Sun, W. Zhan, C. Yang, and M. Tomizuka, “Efficient sampling-based maximum entropy inverse reinforcement learning with application to autonomous driving,” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 5355–5362, 2020.
- S. Pitis, H. Chan, S. Zhao, B. Stadie, and J. Ba, “Maximum entropy gain exploration for long horizon multi-goal reinforcement learning,” in International Conference on Machine Learning. PMLR, 2020, pp. 7750–7761.
- D. Helbing and P. Molnar, “Social force model for pedestrian dynamics,” Physical review E, vol. 51, no. 5, p. 4282, 1995.
- S. Shi, L. Jiang, D. Dai, and B. Schiele, “Motion transformer with global intention localization and local movement refinement,” arXiv preprint arXiv:2209.13508, 2022.
- T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data,” in ECCV. Springer, 2020, pp. 683–700.
- Z. Zhou, L. Ye, J. Wang, K. Wu, and K. Lu, “Hivt: Hierarchical vector transformer for multi-agent motion prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8823–8833.
- X. Jia, P. Wu, L. Chen, Y. Liu, H. Li, and J. Yan, “Hdgt: Heterogeneous driving graph transformer for multi-agent trajectory prediction via scene encoding,” IEEE transactions on pattern analysis and machine intelligence, 2023.
- A. Seff, B. Cera, D. Chen, M. Ng, A. Zhou, N. Nayakanti, K. S. Refaat, R. Al-Rfou, and B. Sapp, “Motionlm: Multi-agent motion forecasting as language modeling,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8579–8590.
- C. Jiang, A. Cornman, C. Park, B. Sapp, Y. Zhou, D. Anguelov, et al., “Motiondiffuser: Controllable multi-agent motion prediction using diffusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9644–9653.
- J. Ngiam, B. Caine, V. Vasudevan, Z. Zhang, H.-T. L. Chiang, J. Ling, R. Roelofs, A. Bewley, C. Liu, A. Venugopal, D. Weiss, B. Sapp, Z. Chen, and J. Shlens, “Scene transformer: A unified architecture for predicting multiple agent trajectories,” in ICLR, June 2021.
- B. Varadarajan, A. Hefny, A. Srivastava, K. S. Refaat, N. Nayakanti, A. Cornman, K. Chen, B. Douillard, C. P. Lam, D. Anguelov, and B. Sapp, “MultiPath++: Efficient information fusion and trajectory aggregation for behavior prediction,” Nov. 2021.
- S. Kumar, Y. Gu, J. Hoang, G. C. Haynes, and M. Marchetti-Bowick, “Interaction-based trajectory prediction over a hybrid traffic graph,” in IROS. IEEE, 2021, pp. 5530–5535.
- Q. Sun, X. Huang, J. Gu, B. C. Williams, and H. Zhao, “M2I: From factored marginal trajectory prediction to interactive prediction,” in CVPR, 2022, pp. 6543–6552.
- Y. Ban, X. Li, G. Rosman, I. Gilitschenski, O. Meireles, S. Karaman, and D. Rus, “A deep concept graph network for interaction-aware trajectory prediction,” in ICRA. IEEE, 2022, pp. 8992–8998.
- J. Lidard, O. So, Y. Zhang, J. DeCastro, X. Cui, X. Huang, Y.-L. Kuo, J. Leonard, A. Balachandran, N. Leonard, and G. Rosman, “Nashformer: Leveraging local nash equilibria for semantically diverse trajectory prediction,” 2023.
- Z. Huang, H. Liu, and C. Lv, “Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,” arXiv preprint arXiv:2303.05760, 2023.
- X. Huang, S. G. McGill, J. A. DeCastro, L. Fletcher, J. J. Leonard, B. C. Williams, and G. Rosman, “DiversityGAN: Diversity-aware vehicle motion prediction via latent semantic sampling,” IEEE RA-L, vol. 5, no. 4, pp. 5089–5096, 2020.
- S. Shiroshita, S. Maruyama, D. Nishiyama, M. Y. Castro, K. Hamzaoui, G. Rosman, J. DeCastro, K.-H. Lee, and A. Gaidon, “Behaviorally diverse traffic simulation via reinforcement learning,” in IROS. IEEE, 2020, pp. 2103–2110.
- H. Zhao, J. Gao, T. Lan, C. Sun, B. Sapp, B. Varadarajan, Y. Shen, Y. Shen, Y. Chai, C. Schmid, et al., “TNT: Target-driven trajectory prediction,” in Conference on Robot Learning. PMLR, 2021, pp. 895–904.
- X. Huang, G. Rosman, I. Gilitschenski, A. Jasour, S. G. McGill, J. J. Leonard, and B. C. Williams, “HYPER: Learned hybrid trajectory prediction via factored inference and adaptive sampling,” in ICRA. IEEE, 2022, pp. 2906–2912.
- A. Dixit, L. Lindemann, S. X. Wei, M. Cleaveland, G. J. Pappas, and J. W. Burdick, “Adaptive conformal prediction for motion planning among dynamic agents,” in Learning for Dynamics and Control Conference. PMLR, 2023, pp. 300–314.
- Z. Huang, H. Liu, J. Wu, and C. Lv, “Differentiable integrated motion prediction and planning with learnable cost function for autonomous driving,” IEEE transactions on neural networks and learning systems, 2023.
- M. D. Donsker and S. S. Varadhan, “Asymptotic evaluation of certain markov process expectations for large time. iv,” Communications on pure and applied mathematics, vol. 36, no. 2, pp. 183–212, 1983.
- P. Dupuis, “Representations and weak convergence methods for the analysis and approximation of rare events,” Padova notes, 2019.
- F. Laine, D. Fridovich-Keil, C.-Y. Chiu, and C. Tomlin, “The computation of approximate generalized feedback nash equilibria,” SIAM Journal on Optimization, vol. 33, no. 1, pp. 294–318, 2023.
- A. Mesbah, “Stochastic model predictive control: An overview and perspectives for future research,” IEEE Control Systems Magazine, vol. 36, no. 6, pp. 30–44, 2016.
- X. Zhang, A. Liniger, and F. Borrelli, “Optimization-based collision avoidance,” IEEE Transactions on Control Systems Technology, vol. 29, no. 3, pp. 972–983, 2020.
- J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang, “JAX: composable transformations of Python+NumPy programs,” 2018. [Online]. Available: http://github.com/google/jax
- A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “Carla: An open urban driving simulator,” in Conference on robot learning. PMLR, 2017, pp. 1–16.
- S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011, pp. 627–635.
- K.-C. Hsu, H. Hu, and J. F. Fisac, “The safety filter: A unified view of safety-critical control in autonomous systems,” Annual Review of Control, Robotics, and Autonomous Systems, 2023.
- N. Nayakanti, R. Al-Rfou, A. Zhou, K. Goel, K. S. Refaat, and B. Sapp, “Wayformer: Motion forecasting via simple & efficient attention networks,” July 2022.
- Y. Chen, B. Ivanovic, and M. Pavone, “Scept: Scene-consistent, policy-based trajectory predictions for planning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 103–17 112.
- C. Gulino, J. Fu, W. Luo, G. Tucker, E. Bronstein, Y. Lu, J. Harb, X. Pan, Y. Wang, X. Chen, et al., “Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research,” arXiv preprint arXiv:2310.08710, 2023.
- N. B. Sarter and D. D. Woods, “Team play with a powerful and independent agent: Operational experiences and automation surprises on the Airbus A-320,” vol. 39, no. 4, pp. 553–569.
- G. A. Jamieson, G. Skraaning, and J. Joe, “The B737 MAX 8 accidents as operational experiences with automation transparency,” vol. 52, no. 4.
- J. F. Fisac, A. Bajcsy, S. L. Herbert, D. Fridovich-Keil, S. Wang, C. J. Tomlin, and A. D. Dragan, “Probabilistically safe robot planning with confidence-based human predictions,” in Robotics: Science and Systems, 2018.
- R. Tian, L. Sun, A. Bajcsy, M. Tomizuka, and A. D. Dragan, “Safety assurances for human-robot interaction via confidence-aware game-theoretic human models,” in ICRA. IEEE, 2022, pp. 11 229–11 235.