Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Game-theoretic Objective Space Planning (2209.07758v2)

Published 16 Sep 2022 in cs.RO and cs.AI

Abstract: Generating competitive strategies and performing continuous motion planning simultaneously in an adversarial setting is a challenging problem. In addition, understanding the intent of other agents is crucial to deploying autonomous systems in adversarial multi-agent environments. Existing approaches either discretize agent action by grouping similar control inputs, sacrificing performance in motion planning, or plan in uninterpretable latent spaces, producing hard-to-understand agent behaviors. Furthermore, the most popular policy optimization frameworks do not recognize the long-term effect of actions and become myopic. This paper proposes an agent action discretization method via abstraction that provides clear intentions of agent actions, an efficient offline pipeline of agent population synthesis, and a planning strategy using counterfactual regret minimization with function approximation. Finally, we experimentally validate our findings on scaled autonomous vehicles in a head-to-head racing setting. We demonstrate that using the proposed framework significantly improves learning, improves the win rate against different opponents, and the improvements can be transferred to unseen opponents in an unseen environment.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. CommonRoad: Composable benchmarks for motion planning on roads. In 2017 IEEE Intelligent Vehicles Symposium (IV). 719–726. https://doi.org/10.1109/IVS.2017.7995802
  2. OpenAI Gym. https://doi.org/10.48550/arXiv.1606.01540 arXiv:1606.01540 [cs].
  3. Deep Counterfactual Regret Minimization. (Nov. 2018). https://doi.org/10.48550/arXiv.1811.00164
  4. R Craig Coulter et al. 1992. Implementation of the pure pursuit path tracking algorithm. Carnegie Mellon University, The Robotics Institute.
  5. Motion planning in urban environments. Journal of Field Robotics 25, 11-12 (2008), 939–960. https://doi.org/10.1002/rob.20265 _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/rob.20265.
  6. David Ha and Jürgen Schmidhuber. 2018. World Models. (March 2018). https://doi.org/10.5281/zenodo.1207631 arXiv:1803.10122 [cs, stat].
  7. Dream to Control: Learning Behaviors by Latent Imagination. https://doi.org/10.48550/arXiv.1912.01603 arXiv:1912.01603 [cs].
  8. Nikolaus Hansen. 2016. The CMA evolution strategy: A tutorial. arXiv preprint arXiv:1604.00772 (2016).
  9. Sergiu Hart and Andreu Mas-Colell. 2000. A Simple Adaptive Procedure Leading to Correlated Equilibrium. Econometrica 68, 5 (2000), 1127–1150. https://doi.org/10.1111/1468-0262.00153 _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/1468-0262.00153.
  10. Minimum curvature trajectory planning and control for an autonomous race car. Vehicle System Dynamics 58, 10 (Oct. 2020), 1497–1527. https://doi.org/10.1080/00423114.2019.1631455 Publisher: Taylor & Francis.
  11. Thomas M Howard. 2009. Adaptive model-predictive motion planning for navigation in complex environments. Carnegie Mellon University.
  12. Thomas M. Howard and Alonzo Kelly. 2007. Optimal Rough Terrain Trajectory Generation for Wheeled Mobile Robots. The International Journal of Robotics Research 26, 2 (Feb. 2007), 141–166. https://doi.org/10.1177/0278364906075328 Publisher: SAGE Publications Ltd STM.
  13. CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms. Journal of Machine Learning Research 23, 274 (2022), 1–18. http://jmlr.org/papers/v23/21-1342.html
  14. Learning Gentle Object Manipulation with Curiosity-Driven Deep Reinforcement Learning. https://doi.org/10.48550/arXiv.1903.08542 arXiv:1903.08542 [cs].
  15. Regret Minimization for Partially Observable Deep Reinforcement Learning. http://arxiv.org/abs/1710.11424 arXiv:1710.11424 [cs].
  16. Model-Based Reinforcement Learning for Atari. https://doi.org/10.48550/arXiv.1903.00374 arXiv:1903.00374 [cs, stat].
  17. Alonzo Kelly and Bryan Nagy. 2003. Reactive Nonholonomic Trajectory Generation via Parametric Optimal Control. The International Journal of Robotics Research 22, 7-8 (July 2003), 583–601. https://doi.org/10.1177/02783649030227008 Publisher: SAGE Publications Ltd STM.
  18. Alex Kulesza and Ben Taskar. 2012. Determinantal Point Processes for Machine Learning. Foundations and Trends® in Machine Learning 5, 2–3 (Dec. 2012), 123–286. https://doi.org/10.1561/2200000044 Publisher: Now Publishers, Inc.
  19. R.T. Marler and J.S. Arora. 2004. Survey of multi-objective optimization methods for engineering. Structural and Multidisciplinary Optimization 26, 6 (April 2004), 369–395. https://doi.org/10.1007/s00158-003-0368-6
  20. Motion planning for autonomous driving with a conformal spatiotemporal lattice. In 2011 IEEE International Conference on Robotics and Automation. 4889–4895. https://doi.org/10.1109/ICRA.2011.5980223 ISSN: 1050-4729.
  21. Playing Atari with Deep Reinforcement Learning. https://doi.org/10.48550/arXiv.1312.5602 arXiv:1312.5602 [cs].
  22. Bryan Nagy and Alonzo Kelly. [n.d.]. TRAJECTORY GENERATION FOR CAR-LIKE ROBOTS USING CUBIC CURVATURE POLYNOMIALS. ([n. d.]).
  23. Guido Novati and Petros Koumoutsakos. 2019. Remember and Forget for Experience Replay. https://doi.org/10.48550/arXiv.1807.05827 arXiv:1807.05827 [cs, stat].
  24. F1TENTH: An Open-source Evaluation Environment for Continuous Control and Reinforcement Learning. Proceedings of Machine Learning Research 123 (April 2020). https://par.nsf.gov/biblio/10221872-f1tenth-open-source-evaluation-environment-continuous-control-reinforcement-learning
  25. J. Schmidhuber. 1990. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments. In 1990 IJCNN International Joint Conference on Neural Networks. IEEE, San Diego, CA, USA, 253–258 vol.2. https://doi.org/10.1109/IJCNN.1990.137723
  26. Proximal Policy Optimization Algorithms. https://doi.org/10.48550/arXiv.1707.06347 arXiv:1707.06347 [cs].
  27. Deep Latent Competition: Learning to Race Using Visual Control Policies in Latent Space. https://doi.org/10.48550/arXiv.2102.09812 arXiv:2102.09812 [cs].
  28. Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies. https://doi.org/10.48550/arXiv.2111.02552 arXiv:2111.02552 [cs].
  29. FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis. In Proceedings of the 37th International Conference on Machine Learning. PMLR, 8992–9004. https://proceedings.mlr.press/v119/sinha20a.html ISSN: 2640-3498.
  30. Value-Decomposition Networks For Cooperative Multi-Agent Learning. http://arxiv.org/abs/1706.05296 arXiv:1706.05296 [cs].
  31. Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning, second edition: An Introduction. MIT Press. Google-Books-ID: uWV0DwAAQBAJ.
  32. Model-Based Reinforcement Learning for Closed-Loop Dynamic Control of Soft Robotic Manipulators. IEEE Transactions on Robotics 35, 1 (Feb. 2019), 124–134. https://doi.org/10.1109/TRO.2018.2878318
  33. Optimal trajectory generation for dynamic street scenarios in a frenet frame. In 2010 IEEE international conference on robotics and automation. IEEE, 987–993.
  34. Learning Latent Representations to Influence Multi-Agent Interaction. In Proceedings of the 2020 Conference on Robot Learning. PMLR, 575–588. https://proceedings.mlr.press/v155/xie21a.html ISSN: 2640-3498.
  35. SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning. In Proceedings of the 36th International Conference on Machine Learning. PMLR, 7444–7453. https://proceedings.mlr.press/v97/zhang19m.html ISSN: 2640-3498.
  36. Regret Minimization in Games with Incomplete Information. In Advances in Neural Information Processing Systems, Vol. 20. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2007/hash/08d98638c6fcd194a4b1e6992063e944-Abstract.html
Citations (5)

Summary

We haven't generated a summary for this paper yet.