Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Blending Data-Driven Priors in Dynamic Games (2402.14174v3)

Published 21 Feb 2024 in cs.RO, cs.AI, cs.SY, eess.SY, and math.OC

Abstract: As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, humans tend to deviate from the decisions prescribed by these models, and their behavior is better approximated under a noisy-rational paradigm. In this work, we investigate a principled methodology to blend a data-driven reference policy with an optimization-based game-theoretic policy. We formulate KLGame, an algorithm for solving non-cooperative dynamic game with Kullback-Leibler (KL) regularization with respect to a general, stochastic, and possibly multi-modal reference policy. Our method incorporates, for each decision maker, a tunable parameter that permits modulation between task-driven and data-driven behaviors. We propose an efficient algorithm for computing multi-modal approximate feedback Nash equilibrium strategies of KLGame in real time. Through a series of simulated and real-world autonomous driving scenarios, we demonstrate that KLGame policies can more effectively incorporate guidance from the reference policy and account for noisily-rational human behaviors versus non-regularized baselines. Website with additional information, videos, and code: https://kl-games.github.io/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (94)
  1. J. L. V. Espinoza, A. Liniger, W. Schwarting, D. Rus, and L. Van Gool, “Deep interactive motion prediction and planning: Playing games with motion prediction models,” in L4DC.   PMLR, 2022, pp. 1006–1019.
  2. L. Lindemann, M. Cleaveland, G. Shim, and G. J. Pappas, “Safe planning in dynamic environments using conformal prediction,” IEEE Robotics and Automation Letters, 2023.
  3. N. Mehr, M. Wang, M. Bhatt, and M. Schwager, “Maximum-entropy multi-agent dynamic games: Forward and inverse solutions,” IEEE Transactions on Robotics, 2023.
  4. D. Fridovich-Keil, E. Ratner, L. Peters, A. D. Dragan, and C. J. Tomlin, “Efficient iterative linear-quadratic approximations for nonlinear multi-player general-sum differential games,” in 2020 IEEE international conference on robotics and automation (ICRA).   IEEE, 2020, pp. 1475–1481.
  5. S. Shi, L. Jiang, D. Dai, and B. Schiele, “MTR++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying,” arXiv preprint arXiv:2306.17770, 2023.
  6. W. Schwarting, A. Pierson, J. Alonso-Mora, S. Karaman, and D. Rus, “Social behavior for autonomous vehicles,” PNAS, vol. 116, no. 50, pp. 24 972–24 978, 2019.
  7. M. Wang, Z. Wang, J. Talbot, J. C. Gerdes, and M. Schwager, “Game-theoretic planning for self-driving cars in multivehicle competitive scenarios,” IEEE T-RO, vol. 37, no. 4, pp. 1313–1325, 2021.
  8. H. Hu, D. Isele, S. Bae, and J. F. Fisac, “Active uncertainty reduction for safe and efficient interaction planning: A shielding-aware dual control approach,” The International Journal of Robotics Research, 2023.
  9. M. Sun, F. Baldini, P. Trautman, and T. Murphey, “Move beyond trajectories: Distribution space coupling for crowd navigation,” ser. Robotics: Science and Systems, 2021.
  10. Z. Williams, J. Chen, and N. Mehr, “Distributed potential ilqr: Scalable game-theoretic trajectory planning for multi-agent interactions,” arXiv preprint arXiv:2303.04842, 2023.
  11. H. Hu, K. Gatsis, M. Morari, and G. J. Pappas, “Non-cooperative distributed MPC with iterative learning,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 5225–5232, 2020.
  12. R. Spica, E. Cristofalo, Z. Wang, E. Montijano, and M. Schwager, “A real-time game theoretic planner for autonomous two-player drone racing,” IEEE Transactions on Robotics, vol. 36, no. 5, pp. 1389–1403, 2020.
  13. S. Musić and S. Hirche, “Haptic shared control for human-robot collaboration: a game-theoretical approach,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 10 216–10 222, 2020.
  14. P. Geiger and C.-N. Straehle, “Learning game-theoretic models of multiagent trajectories using implicit layers,” in AAAI, vol. 35, no. 6, 2021, pp. 4950–4958.
  15. S. Le Cleac’h, M. Schwager, Z. Manchester, et al., “ALGAMES: A fast solver for constrained dynamic games,” in RSS, 2020.
  16. X. Liu, L. Peters, and J. Alonso-Mora, “Learning to play trajectory games against opponents with unknown objectives,” IEEE Robotics and Automation Letters, 2023.
  17. S. Le Cleac’h, M. Schwager, and Z. Manchester, “LUCIDgames: Online unscented inverse dynamic games for adaptive trajectory prediction and planning,” IEEE RA-L, vol. 6, no. 3, pp. 5485–5492, 2021.
  18. J. F. Fisac, E. Bronstein, E. Stefansson, D. Sadigh, S. S. Sastry, and A. D. Dragan, “Hierarchical Game-Theoretic planning for autonomous vehicles,” in ICRA, May 2019, pp. 9590–9596.
  19. H. Hu and J. F. Fisac, “Active uncertainty reduction for human-robot interaction: An implicit dual control approach,” in Algorithmic Foundations of Robotics XV, 2022, pp. 385–401.
  20. W. Schwarting, A. Pierson, S. Karaman, and D. Rus, “Stochastic dynamic games in belief space,” IEEE T-RO, vol. 37, no. 6, pp. 2157–2172, 2021.
  21. F. Steinberger, R. Schroeter, and C. N. Watling, “From road distraction to safe driving: Evaluating the effects of boredom and gamification on driving behaviour, physiological arousal, and subjective experience,” Computers in Human Behavior, vol. 75, pp. 714–726, 2017.
  22. A. Talebpour, H. S. Mahmassani, and S. H. Hamdar, “Modeling lane-changing behavior in a connected environment: A game theory approach,” Transportation Research Procedia, vol. 7, pp. 420–440, 2015.
  23. H. Hu, Z. Zhang, K. Nakamura, A. Bajcsy, and J. F. Fisac, “Deception game: Closing the safety-learning loop in interactive robot autonomy,” in Conference on Robot Learning.   PMLR, 2023, pp. 3830–3850.
  24. K. L. Young, S. Koppel, and J. L. Charlton, “Toward best practice in human machine interface design for older drivers: A review of current design guidelines,” Accident Analysis & Prevention, vol. 106, pp. 460–467, 2017.
  25. B. D. Ziebart, A. L. Maas, J. A. Bagnell, A. K. Dey, et al., “Maximum entropy inverse reinforcement learning.” in AAAI, vol. 8, 2008, pp. 1433–1438.
  26. M. Wulfmeier, D. Rao, D. Z. Wang, P. Ondruska, and I. Posner, “Large-scale cost function learning for path planning using deep inverse reinforcement learning,” The International Journal of Robotics Research, vol. 36, no. 10, pp. 1073–1087, 2017.
  27. B. Evens, M. Schuurmans, and P. Patrinos, “Learning mpc for interaction-aware autonomous driving: A game-theoretic approach,” in 2022 European Control Conference (ECC), 2022, pp. 34–39.
  28. T. Phan-Minh, F. Howington, T.-S. Chu, S. U. Lee, M. S. Tomov, N. Li, C. Dicle, S. Findler, F. Suarez-Ruiz, R. Beaudoin, et al., “Driving in real life with inverse reinforcement learning,” arXiv preprint arXiv:2206.03004, 2022.
  29. L. Peters, D. Fridovich-Keil, C. J. Tomlin, and Z. N. Sunberg, “Inference-based strategy alignment for general-sum differential games,” arXiv preprint arXiv:2002.04354, 2020.
  30. O. So, Z. Wang, and E. A. Theodorou, “Maximum entropy differential dynamic programming,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 3422–3428.
  31. O. So, P. Drews, T. Balch, V. Dimitrov, G. Rosman, and E. A. Theodorou, “MPOGames: Efficient multimodal partially observable dynamic games,” in ICRA.   IEEE, 2023, pp. 3189–3196.
  32. M. L. Littman, A. R. Cassandra, and L. P. Kaelbling, “Learning policies for partially observable environments: Scaling up,” in Machine Learning Proceedings 1995.   Elsevier, 1995, pp. 362–370.
  33. L. Peters, A. Bajcsy, C.-Y. Chiu, D. Fridovich-Keil, F. Laine, L. Ferranti, and J. Alonso-Mora, “Contingency games for multi-agent interaction,” arXiv preprint arXiv:2304.05483, 2023.
  34. H. Hu, K. Nakamura, K.-C. Hsu, N. E. Leonard, and J. F. Fisac, “Emergent coordination through game-induced nonlinear opinion dynamics,” in 2023 62nd IEEE Conference on Decision and Control (CDC).   IEEE, 2023, pp. 8122–8129.
  35. L. Peters, D. Fridovich-Keil, L. Ferranti, C. Stachniss, J. Alonso-Mora, and F. Laine, “Learning mixed strategies in trajectory games,” in Proc. of Robotics: Science and Systems (RSS), 2022.
  36. J. Li, C.-Y. Chiu, L. Peters, S. Sojoudi, C. Tomlin, and D. Fridovich-Keil, “Cost inference for feedback dynamic games from noisy partial state observations and incomplete trajectories,” arXiv preprint arXiv:2301.01398, 2023.
  37. E. Todorov, “Optimality principles in sensorimotor control,” Nature neuroscience, vol. 7, no. 9, pp. 907–915, 2004.
  38. F. Bartumeus and S. A. Levin, “Fractal reorientation clocks: Linking animal behavior to statistical patterns of search,” Proceedings of the National Academy of Sciences, vol. 105, no. 49, pp. 19 072–19 077, 2008.
  39. S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y. Chai, B. Sapp, C. R. Qi, Y. Zhou, et al., “Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset,” in CVPR, 2021, pp. 9710–9719.
  40. J. Kim and I. Yang, “Hamilton-jacobi-bellman equations for maximum entropy optimal control,” arXiv preprint arXiv:2009.13097, 2020.
  41. E. A. Theodorou and E. Todorov, “Relative entropy and free energy dualities: Connections to path integral and kl control,” in 2012 ieee 51st ieee conference on decision and control (cdc).   IEEE, 2012, pp. 1466–1473.
  42. T. Haarnoja, H. Tang, P. Abbeel, and S. Levine, “Reinforcement learning with deep energy-based policies,” in International conference on machine learning.   PMLR, 2017, pp. 1352–1361.
  43. D. Garg, S. Chakraborty, C. Cundy, J. Song, and S. Ermon, “Iq-learn: Inverse soft-q learning for imitation,” Advances in Neural Information Processing Systems, vol. 34, pp. 4028–4039, 2021.
  44. E. Todorov, “Linearly-solvable markov decision problems,” Advances in neural information processing systems, vol. 19, 2006.
  45. ——, “Compositionality of optimal control laws,” Advances in neural information processing systems, vol. 22, 2009.
  46. P. Guan, M. Raginsky, and R. M. Willett, “Online markov decision processes with kullback–leibler control cost,” IEEE Transactions on Automatic Control, vol. 59, no. 6, pp. 1423–1438, 2014.
  47. K. Ito and K. Kashima, “Kullback–leibler control for discrete-time nonlinear systems on continuous spaces,” SICE Journal of Control, Measurement, and System Integration, vol. 15, no. 2, pp. 119–129, 2022.
  48. J. Ok, A. Proutiere, and D. Tranos, “Exploration in structured reinforcement learning,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  49. R. Munos, M. Valko, D. Calandriello, M. G. Azar, M. Rowland, Z. D. Guo, Y. Tang, M. Geist, T. Mesnard, A. Michi, et al., “Nash learning from human feedback,” arXiv preprint arXiv:2312.00886, 2023.
  50. D. Bernardini and A. Bemporad, “Stabilizing model predictive control of stochastic constrained linear systems,” IEEE Trans. Autom. Control, vol. 57, no. 6, pp. 1468–1480, 2011.
  51. M. C. Campi, S. Garatti, and F. A. Ramponi, “A general scenario theory for nonconvex optimization and decision making,” IEEE Transactions on Automatic Control, vol. 63, no. 12, pp. 4067–4078, 2018.
  52. G. Schildbach and F. Borrelli, “Scenario model predictive control for lane change assistance on highways,” in IEEE Intelligent Vehicles Symposium (IV), 2015, pp. 611–616.
  53. Y. Chen, U. Rosolia, W. Ubellacker, N. Csomay-Shanklin, and A. D. Ames, “Interactive multi-modal motion planning with branch model predictive control,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5365–5372, 2022.
  54. H. Hu, K. Nakamura, and J. F. Fisac, “SHARP: Shielding-aware robust planning for safe and efficient human-robot interaction,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5591–5598, 2022.
  55. J. Li, C.-Y. Chiu, L. Peters, F. Palafox, M. Karabag, J. Alonso-Mora, S. Sojoudi, C. Tomlin, and D. Fridovich-Keil, “Scenario-game admm: A parallelized scenario-based solver for stochastic noncooperative games,” arXiv preprint arXiv:2304.01945, 2023.
  56. B. Eysenbach and S. Levine, “Maximum entropy RL (provably) solves some robust RL problems,” arXiv preprint arXiv:2103.06257, 2021.
  57. Z. Wu, L. Sun, W. Zhan, C. Yang, and M. Tomizuka, “Efficient sampling-based maximum entropy inverse reinforcement learning with application to autonomous driving,” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 5355–5362, 2020.
  58. S. Pitis, H. Chan, S. Zhao, B. Stadie, and J. Ba, “Maximum entropy gain exploration for long horizon multi-goal reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2020, pp. 7750–7761.
  59. D. Helbing and P. Molnar, “Social force model for pedestrian dynamics,” Physical review E, vol. 51, no. 5, p. 4282, 1995.
  60. S. Shi, L. Jiang, D. Dai, and B. Schiele, “Motion transformer with global intention localization and local movement refinement,” arXiv preprint arXiv:2209.13508, 2022.
  61. T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data,” in ECCV.   Springer, 2020, pp. 683–700.
  62. Z. Zhou, L. Ye, J. Wang, K. Wu, and K. Lu, “Hivt: Hierarchical vector transformer for multi-agent motion prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8823–8833.
  63. X. Jia, P. Wu, L. Chen, Y. Liu, H. Li, and J. Yan, “Hdgt: Heterogeneous driving graph transformer for multi-agent trajectory prediction via scene encoding,” IEEE transactions on pattern analysis and machine intelligence, 2023.
  64. A. Seff, B. Cera, D. Chen, M. Ng, A. Zhou, N. Nayakanti, K. S. Refaat, R. Al-Rfou, and B. Sapp, “Motionlm: Multi-agent motion forecasting as language modeling,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8579–8590.
  65. C. Jiang, A. Cornman, C. Park, B. Sapp, Y. Zhou, D. Anguelov, et al., “Motiondiffuser: Controllable multi-agent motion prediction using diffusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9644–9653.
  66. J. Ngiam, B. Caine, V. Vasudevan, Z. Zhang, H.-T. L. Chiang, J. Ling, R. Roelofs, A. Bewley, C. Liu, A. Venugopal, D. Weiss, B. Sapp, Z. Chen, and J. Shlens, “Scene transformer: A unified architecture for predicting multiple agent trajectories,” in ICLR, June 2021.
  67. B. Varadarajan, A. Hefny, A. Srivastava, K. S. Refaat, N. Nayakanti, A. Cornman, K. Chen, B. Douillard, C. P. Lam, D. Anguelov, and B. Sapp, “MultiPath++: Efficient information fusion and trajectory aggregation for behavior prediction,” Nov. 2021.
  68. S. Kumar, Y. Gu, J. Hoang, G. C. Haynes, and M. Marchetti-Bowick, “Interaction-based trajectory prediction over a hybrid traffic graph,” in IROS.   IEEE, 2021, pp. 5530–5535.
  69. Q. Sun, X. Huang, J. Gu, B. C. Williams, and H. Zhao, “M2I: From factored marginal trajectory prediction to interactive prediction,” in CVPR, 2022, pp. 6543–6552.
  70. Y. Ban, X. Li, G. Rosman, I. Gilitschenski, O. Meireles, S. Karaman, and D. Rus, “A deep concept graph network for interaction-aware trajectory prediction,” in ICRA.   IEEE, 2022, pp. 8992–8998.
  71. J. Lidard, O. So, Y. Zhang, J. DeCastro, X. Cui, X. Huang, Y.-L. Kuo, J. Leonard, A. Balachandran, N. Leonard, and G. Rosman, “Nashformer: Leveraging local nash equilibria for semantically diverse trajectory prediction,” 2023.
  72. Z. Huang, H. Liu, and C. Lv, “Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,” arXiv preprint arXiv:2303.05760, 2023.
  73. X. Huang, S. G. McGill, J. A. DeCastro, L. Fletcher, J. J. Leonard, B. C. Williams, and G. Rosman, “DiversityGAN: Diversity-aware vehicle motion prediction via latent semantic sampling,” IEEE RA-L, vol. 5, no. 4, pp. 5089–5096, 2020.
  74. S. Shiroshita, S. Maruyama, D. Nishiyama, M. Y. Castro, K. Hamzaoui, G. Rosman, J. DeCastro, K.-H. Lee, and A. Gaidon, “Behaviorally diverse traffic simulation via reinforcement learning,” in IROS.   IEEE, 2020, pp. 2103–2110.
  75. H. Zhao, J. Gao, T. Lan, C. Sun, B. Sapp, B. Varadarajan, Y. Shen, Y. Shen, Y. Chai, C. Schmid, et al., “TNT: Target-driven trajectory prediction,” in Conference on Robot Learning.   PMLR, 2021, pp. 895–904.
  76. X. Huang, G. Rosman, I. Gilitschenski, A. Jasour, S. G. McGill, J. J. Leonard, and B. C. Williams, “HYPER: Learned hybrid trajectory prediction via factored inference and adaptive sampling,” in ICRA.   IEEE, 2022, pp. 2906–2912.
  77. A. Dixit, L. Lindemann, S. X. Wei, M. Cleaveland, G. J. Pappas, and J. W. Burdick, “Adaptive conformal prediction for motion planning among dynamic agents,” in Learning for Dynamics and Control Conference.   PMLR, 2023, pp. 300–314.
  78. Z. Huang, H. Liu, J. Wu, and C. Lv, “Differentiable integrated motion prediction and planning with learnable cost function for autonomous driving,” IEEE transactions on neural networks and learning systems, 2023.
  79. M. D. Donsker and S. S. Varadhan, “Asymptotic evaluation of certain markov process expectations for large time. iv,” Communications on pure and applied mathematics, vol. 36, no. 2, pp. 183–212, 1983.
  80. P. Dupuis, “Representations and weak convergence methods for the analysis and approximation of rare events,” Padova notes, 2019.
  81. F. Laine, D. Fridovich-Keil, C.-Y. Chiu, and C. Tomlin, “The computation of approximate generalized feedback nash equilibria,” SIAM Journal on Optimization, vol. 33, no. 1, pp. 294–318, 2023.
  82. A. Mesbah, “Stochastic model predictive control: An overview and perspectives for future research,” IEEE Control Systems Magazine, vol. 36, no. 6, pp. 30–44, 2016.
  83. X. Zhang, A. Liniger, and F. Borrelli, “Optimization-based collision avoidance,” IEEE Transactions on Control Systems Technology, vol. 29, no. 3, pp. 972–983, 2020.
  84. J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang, “JAX: composable transformations of Python+NumPy programs,” 2018. [Online]. Available: http://github.com/google/jax
  85. A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “Carla: An open urban driving simulator,” in Conference on robot learning.   PMLR, 2017, pp. 1–16.
  86. S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics.   JMLR Workshop and Conference Proceedings, 2011, pp. 627–635.
  87. K.-C. Hsu, H. Hu, and J. F. Fisac, “The safety filter: A unified view of safety-critical control in autonomous systems,” Annual Review of Control, Robotics, and Autonomous Systems, 2023.
  88. N. Nayakanti, R. Al-Rfou, A. Zhou, K. Goel, K. S. Refaat, and B. Sapp, “Wayformer: Motion forecasting via simple & efficient attention networks,” July 2022.
  89. Y. Chen, B. Ivanovic, and M. Pavone, “Scept: Scene-consistent, policy-based trajectory predictions for planning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 103–17 112.
  90. C. Gulino, J. Fu, W. Luo, G. Tucker, E. Bronstein, Y. Lu, J. Harb, X. Pan, Y. Wang, X. Chen, et al., “Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research,” arXiv preprint arXiv:2310.08710, 2023.
  91. N. B. Sarter and D. D. Woods, “Team play with a powerful and independent agent: Operational experiences and automation surprises on the Airbus A-320,” vol. 39, no. 4, pp. 553–569.
  92. G. A. Jamieson, G. Skraaning, and J. Joe, “The B737 MAX 8 accidents as operational experiences with automation transparency,” vol. 52, no. 4.
  93. J. F. Fisac, A. Bajcsy, S. L. Herbert, D. Fridovich-Keil, S. Wang, C. J. Tomlin, and A. D. Dragan, “Probabilistically safe robot planning with confidence-based human predictions,” in Robotics: Science and Systems, 2018.
  94. R. Tian, L. Sun, A. Bajcsy, M. Tomizuka, and A. D. Dragan, “Safety assurances for human-robot interaction via confidence-aware game-theoretic human models,” in ICRA.   IEEE, 2022, pp. 11 229–11 235.
Citations (3)

Summary

We haven't generated a summary for this paper yet.