Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning (2405.02425v1)
Abstract: We apply multi-agent deep reinforcement learning (RL) to train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision. This setting reflects many challenges of real-world robotics, including active perception, agile full-body control, and long-horizon planning in a dynamic, partially-observable, multi-agent domain. We rely on large-scale, simulation-based data generation to obtain complex behaviors from egocentric vision which can be successfully transferred to physical robots using low-cost sensors. To achieve adequate visual realism, our simulation combines rigid-body physics with learned, realistic rendering via multiple Neural Radiance Fields (NeRFs). We combine teacher-based multi-agent RL and cross-experiment data reuse to enable the discovery of sophisticated soccer strategies. We analyze active-perception behaviors including object tracking and ball seeking that emerge when simply optimizing perception-agnostic soccer play. The agents display equivalent levels of performance and agility as policies with access to privileged, ground-truth state. To our knowledge, this paper constitutes a first demonstration of end-to-end training for multi-agent robot soccer, mapping raw pixel observations to joint-level actions, that can be deployed in the real world. Videos of the game-play and analyses can be seen on our website https://sites.google.com/view/vision-soccer .
- Extreme parkour with legged robots. arXiv preprint arXiv:2309.14341, 2023.
- Anymal parkour: Learning agile navigation for quadrupedal robots. Sci. Robotics, 9(88), 2024.
- Robot parkour learning. In J. Tan, M. Toussaint, and K. Darvish, editors, Conference on Robot Learning, CoRL 2023, 6-9 November 2023, Atlanta, GA, USA, volume 229 of Proceedings of Machine Learning Research, pages 73–92. PMLR, 2023.
- Learning agile soccer skills for a bipedal robot with deep reinforcement learning. Science Robotics, 9(89), 2024.
- Real-world humanoid locomotion with reinforcement learning. Science Robotics, 9(89):eadi9579, 2024. doi:10.1126/scirobotics.adi9579.
- Solving rubik’s cube with a robot hand. CoRR, abs/1910.07113, 2019.
- Champion-level drone racing using deep reinforcement learning. Nature, 620(7976):982–987, 2023.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033, 2012. doi:10.1109/IROS.2012.6386109.
- NeRF: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Nerf2real: Sim2real transfer of vision-guided bipedal motion skills using neural radiance fields. 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 9362–9369, 2022. URL https://api.semanticscholar.org/CorpusID:252815541.
- Replay across experiments: A natural extension of off-policy RL. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=Nf4Lm6fXN8.
- D. A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems, 1, 1988.
- Maximum a posteriori policy optimisation. In International Conference on Learning Representations, 2018.
- A distributional perspective on reinforcement learning. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 449–458. PMLR, 2017.
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, 1997.
- Asymmetric actor critic for image-based robot learning. In H. Kress-Gazit, S. S. Srinivasa, T. Howard, and N. Atanasov, editors, Robotics: Science and Systems XIV, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, June 26-30, 2018, 2018.
- Robotis. Robotis OP3 manual. https://emanual.robotis.com/docs/en/platform/op3/introduction, March 2023.
- Game strategies for physical robot soccer players: A survey. IEEE Transactions on Games, 13(4):342–357, 2021. doi:10.1109/TG.2021.3075065.
- Scrutinizing and de-biasing intuitive physics with neural stethoscopes. In British Machine Vision Conference (BMVC), 2019. British Machine Vision Association, 2019.
- Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 7(62):eabk2822, 2022.
- Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–30. IEEE, 2017.
- Robot learning from randomized simulations: A review. Frontiers in Robotics and AI, page 31, 2022.
- How to train your robot with deep reinforcement learning: lessons we have learned. The International Journal of Robotics Research, 40(4-5):698–721, 2021.
- Matterport3d: Learning from RGB-D data in indoor environments. In 2017 International Conference on 3D Vision, 3DV 2017, Qingdao, China, October 10-12, 2017, pages 667–676. IEEE Computer Society, 2017.
- Gibson env: Real-world perception for embodied agents. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9068–9079, 2018.
- Dribblebot: Dynamic legged manipulation in the wild. In IEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023, pages 5155–5162. IEEE, 2023.
- Learning to look by self-prediction. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=9aXKUJEKwV.
- Real-time active vision for a humanoid soccer robot using deep reinforcement learning. 02 2021. doi:10.5220/0010237307420751.
- RoboCup: The robot world cup initiative. In Proceedings of the first international conference on Autonomous agents, pages 340–347, 1997.
- RoboCup Federation. Robocup project. https://www.robocup.org, May 2024.
- Karlsruhe Brainstormers - a reinforcement learning approach to robotic soccer. In RoboCup-2000: Robot Soccer World Cup IV, LNCS, pages 367–372. Springer, 2000.
- P. Stone and M. Veloso. Layered learning. In European conference on machine learning, pages 369–381. Springer, 2000.
- P. MacAlpine and P. Stone. Overlapping layered learning. Artificial Intelligence, 254:21–43, 2018.
- M. Hausknecht and P. Stone. Learning powerful kicks on the Aibo ERS-7: The quest for a striker. In J. R. del Solar, E. Chown, and P. G. Plöger, editors, RoboCup-2010: Robot Soccer World Cup XIV, volume 6556 of Lecture Notes in Artificial Intelligence, pages 254–65. Springer Verlag, Berlin, 2011.
- Autonomous learning of stable quadruped locomotion. In G. Lakemeyer, E. Sklar, D. Sorenti, and T. Takahashi, editors, RoboCup-2006: Robot Soccer World Cup X, volume 4434 of Lecture Notes in Artificial Intelligence, pages 98–109. Springer Verlag, Berlin, 2007. ISBN 978-3-540-74023-0.
- Reinforcement learning for robot soccer. Autonomous Robots, 27(1):55–73, 2009.
- Sim-to-real: Learning agile locomotion for quadruped robots, 2018.
- Sim-to-real transfer for biped locomotion. In 2019 ieee/rsj international conference on intelligent robots and systems (IROS), pages 3503–3510. IEEE, 2019.
- Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26), 2019. doi:10.1126/scirobotics.aau5872.
- S. Masuda and K. Takahashi. Sim-to-real learning of robust compliant bipedal locomotion on torque sensor-less gear-driven humanoid. arXiv preprint arXiv:2204.03897, 2022.
- A. L. Samuel. Some studies in machine learning using the game of checkers. IBM J. Res. Dev., 3(3):210–229, July 1959. ISSN 0018-8646.
- Deep Blue. Artif. Intell., 134(1-2):57–83, 2002.
- G. Tesauro. Temporal difference learning and TD-gammon. Commun. ACM, 38(3):58–68, Mar. 1995.
- Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484, 2016.
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019. doi:10.1038/s41586-019-1724-z.
- Emergent complexity via multi-agent competition. In 6th International Conference on Learning Representations (ICLR), 2018.
- Emergent tool use from multi-agent autocurricula. In 8th International Conference on Learning Representations (ICLR), 2020, 2020.
- Emergent coordination through competition. ICLR, 2019.
- Open-ended learning in symmetric zero-sum games. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 434–443. PMLR, 2019. URL http://proceedings.mlr.press/v97/balduzzi19a.html.
- A unified game-theoretic approach to multiagent reinforcement learning. In Advances in neural information processing systems, pages 4190–4203, 2017.
- Fictitious self-play in extensive-form games. In F. R. Bach and D. M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 805–813. JMLR.org, 2015.
- G. W. Brown. Iterative solution of games by fictitious play. In T. C. Koopmans, editor, Activity Analysis of Production and Allocation. Wiley, New York, 1951.
- S. Thrun and A. Schwartz. Finding structure in reinforcement learning. Advances in neural information processing systems, 7, 1994.
- M. Bowling and M. Veloso. Reusing learned policies between similar problems. In Proceedings of the AI* AI-98 Workshop on New Trends in Robotics. Citeseer, 1998.
- Foundations for transfer in reinforcement learning: A taxonomy of knowledge modalities. ArXiv, abs/2312.01939, 2023. URL https://api.semanticscholar.org/CorpusID:265609417.
- Kickstarting deep reinforcement learning. ArXiv, abs/1803.03835, 2018.
- Information asymmetry in KL-regularized RL. In International Conference on Learning Representations, 2019.
- Learning transferable motor skills with hierarchical latent mixture policies. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- Learning an embedding space for transferable robot skills. In International Conference on Learning Representations, 2018.
- OPAL: Offline primitive discovery for accelerating offline reinforcement learning. In International Conference on Learning Representations, 2021.
- Parrot: Data-Driven behavioral priors for reinforcement learning. In International Conference on Learning Representations, 2021.
- Behavior priors for efficient reinforcement learning. The Journal of Machine Learning Research, 23(1):9989–10056, 2022.
- Skills: Adaptive skill sequencing for efficient temporally-extended exploration. arXiv preprint arXiv:2211.13743, 2022.
- Data-efficient hindsight off-policy option learning. In International Conference on Machine Learning, pages 11340–11350. PMLR, 2021.
- Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817, 2017.
- Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE international conference on robotics and automation (ICRA), pages 6292–6299. IEEE, 2018.
- Offline-to-online reinforcement learning via balanced replay and pessimistic q-ensemble. In Conference on Robot Learning, pages 1702–1712. PMLR, 2022.
- Don’t start from scratch: Leveraging prior data to automate robotic reinforcement learning. In Conference on Robot Learning, 2022.
- Cog: Connecting new skills to past experience with offline reinforcement learning. 2020.
- Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020.
- Critic regularized regression. Advances in Neural Information Processing Systems, 33, 2020.
- Keep doing what worked: Behavior modelling priors for offline reinforcement learning. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
- The challenges of exploration for offline reinforcement learning. ArXiv, abs/2201.11861, 2022.
- On multi-objective policy optimization as a tool for reinforcement learning. arXiv preprint arXiv:2106.08199, 2021.
- Iterative reinforcement learning based design of dynamic locomotion skills for Cassie. CoRR, abs/1903.09537, 2019. URL http://arxiv.org/abs/1903.09537.
- Agility Robotics. Cassie sets world record for 100m run, 2022. https://www.youtube.com/watch?v=DdojWYOK0Nc.
- Blind bipedal stair traversal via sim-to-real reinforcement learning. In D. A. Shell, M. Toussaint, and M. A. Hsieh, editors, Robotics: Science and Systems XVII, Virtual Event, July 12-16, 2021, 2021.
- Robust and versatile bipedal jumping control through multi-task reinforcement learning. CoRR, abs/2302.09450, 2023. doi:10.48550/arXiv.2302.09450.
- Towards real robot learning in the wild: A case study in bipedal locomotion. In Conference on Robot Learning, pages 1502–1511. PMLR, 2022.
- Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5307–5314. IEEE, 2015.
- Dhruva Tirumala (15 papers)
- Markus Wulfmeier (46 papers)
- Ben Moran (9 papers)
- Sandy Huang (5 papers)
- Jan Humplik (15 papers)
- Guy Lever (18 papers)
- Tuomas Haarnoja (16 papers)
- Leonard Hasenclever (33 papers)
- Arunkumar Byravan (27 papers)
- Nathan Batchelor (5 papers)
- Neil Sreendra (2 papers)
- Kushal Patel (9 papers)
- Marlon Gwira (2 papers)
- Francesco Nori (51 papers)
- Martin Riedmiller (64 papers)
- Nicolas Heess (139 papers)