Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning (2405.02425v1)

Published 3 May 2024 in cs.RO and cs.AI

Abstract: We apply multi-agent deep reinforcement learning (RL) to train end-to-end robot soccer policies with fully onboard computation and sensing via egocentric RGB vision. This setting reflects many challenges of real-world robotics, including active perception, agile full-body control, and long-horizon planning in a dynamic, partially-observable, multi-agent domain. We rely on large-scale, simulation-based data generation to obtain complex behaviors from egocentric vision which can be successfully transferred to physical robots using low-cost sensors. To achieve adequate visual realism, our simulation combines rigid-body physics with learned, realistic rendering via multiple Neural Radiance Fields (NeRFs). We combine teacher-based multi-agent RL and cross-experiment data reuse to enable the discovery of sophisticated soccer strategies. We analyze active-perception behaviors including object tracking and ball seeking that emerge when simply optimizing perception-agnostic soccer play. The agents display equivalent levels of performance and agility as policies with access to privileged, ground-truth state. To our knowledge, this paper constitutes a first demonstration of end-to-end training for multi-agent robot soccer, mapping raw pixel observations to joint-level actions, that can be deployed in the real world. Videos of the game-play and analyses can be seen on our website https://sites.google.com/view/vision-soccer .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (80)
  1. Extreme parkour with legged robots. arXiv preprint arXiv:2309.14341, 2023.
  2. Anymal parkour: Learning agile navigation for quadrupedal robots. Sci. Robotics, 9(88), 2024.
  3. Robot parkour learning. In J. Tan, M. Toussaint, and K. Darvish, editors, Conference on Robot Learning, CoRL 2023, 6-9 November 2023, Atlanta, GA, USA, volume 229 of Proceedings of Machine Learning Research, pages 73–92. PMLR, 2023.
  4. Learning agile soccer skills for a bipedal robot with deep reinforcement learning. Science Robotics, 9(89), 2024.
  5. Real-world humanoid locomotion with reinforcement learning. Science Robotics, 9(89):eadi9579, 2024. doi:10.1126/scirobotics.adi9579.
  6. Solving rubik’s cube with a robot hand. CoRR, abs/1910.07113, 2019.
  7. Champion-level drone racing using deep reinforcement learning. Nature, 620(7976):982–987, 2023.
  8. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033, 2012. doi:10.1109/IROS.2012.6386109.
  9. NeRF: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  10. Nerf2real: Sim2real transfer of vision-guided bipedal motion skills using neural radiance fields. 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 9362–9369, 2022. URL https://api.semanticscholar.org/CorpusID:252815541.
  11. Replay across experiments: A natural extension of off-policy RL. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=Nf4Lm6fXN8.
  12. D. A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems, 1, 1988.
  13. Maximum a posteriori policy optimisation. In International Conference on Learning Representations, 2018.
  14. A distributional perspective on reinforcement learning. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 449–458. PMLR, 2017.
  15. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735–1780, 1997.
  16. Asymmetric actor critic for image-based robot learning. In H. Kress-Gazit, S. S. Srinivasa, T. Howard, and N. Atanasov, editors, Robotics: Science and Systems XIV, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, June 26-30, 2018, 2018.
  17. Robotis. Robotis OP3 manual. https://emanual.robotis.com/docs/en/platform/op3/introduction, March 2023.
  18. Game strategies for physical robot soccer players: A survey. IEEE Transactions on Games, 13(4):342–357, 2021. doi:10.1109/TG.2021.3075065.
  19. Scrutinizing and de-biasing intuitive physics with neural stethoscopes. In British Machine Vision Conference (BMVC), 2019. British Machine Vision Association, 2019.
  20. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 7(62):eabk2822, 2022.
  21. Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–30. IEEE, 2017.
  22. Robot learning from randomized simulations: A review. Frontiers in Robotics and AI, page 31, 2022.
  23. How to train your robot with deep reinforcement learning: lessons we have learned. The International Journal of Robotics Research, 40(4-5):698–721, 2021.
  24. Matterport3d: Learning from RGB-D data in indoor environments. In 2017 International Conference on 3D Vision, 3DV 2017, Qingdao, China, October 10-12, 2017, pages 667–676. IEEE Computer Society, 2017.
  25. Gibson env: Real-world perception for embodied agents. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9068–9079, 2018.
  26. Dribblebot: Dynamic legged manipulation in the wild. In IEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023, pages 5155–5162. IEEE, 2023.
  27. Learning to look by self-prediction. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=9aXKUJEKwV.
  28. Real-time active vision for a humanoid soccer robot using deep reinforcement learning. 02 2021. doi:10.5220/0010237307420751.
  29. RoboCup: The robot world cup initiative. In Proceedings of the first international conference on Autonomous agents, pages 340–347, 1997.
  30. RoboCup Federation. Robocup project. https://www.robocup.org, May 2024.
  31. Karlsruhe Brainstormers - a reinforcement learning approach to robotic soccer. In RoboCup-2000: Robot Soccer World Cup IV, LNCS, pages 367–372. Springer, 2000.
  32. P. Stone and M. Veloso. Layered learning. In European conference on machine learning, pages 369–381. Springer, 2000.
  33. P. MacAlpine and P. Stone. Overlapping layered learning. Artificial Intelligence, 254:21–43, 2018.
  34. M. Hausknecht and P. Stone. Learning powerful kicks on the Aibo ERS-7: The quest for a striker. In J. R. del Solar, E. Chown, and P. G. Plöger, editors, RoboCup-2010: Robot Soccer World Cup XIV, volume 6556 of Lecture Notes in Artificial Intelligence, pages 254–65. Springer Verlag, Berlin, 2011.
  35. Autonomous learning of stable quadruped locomotion. In G. Lakemeyer, E. Sklar, D. Sorenti, and T. Takahashi, editors, RoboCup-2006: Robot Soccer World Cup X, volume 4434 of Lecture Notes in Artificial Intelligence, pages 98–109. Springer Verlag, Berlin, 2007. ISBN 978-3-540-74023-0.
  36. Reinforcement learning for robot soccer. Autonomous Robots, 27(1):55–73, 2009.
  37. Sim-to-real: Learning agile locomotion for quadruped robots, 2018.
  38. Sim-to-real transfer for biped locomotion. In 2019 ieee/rsj international conference on intelligent robots and systems (IROS), pages 3503–3510. IEEE, 2019.
  39. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26), 2019. doi:10.1126/scirobotics.aau5872.
  40. S. Masuda and K. Takahashi. Sim-to-real learning of robust compliant bipedal locomotion on torque sensor-less gear-driven humanoid. arXiv preprint arXiv:2204.03897, 2022.
  41. A. L. Samuel. Some studies in machine learning using the game of checkers. IBM J. Res. Dev., 3(3):210–229, July 1959. ISSN 0018-8646.
  42. Deep Blue. Artif. Intell., 134(1-2):57–83, 2002.
  43. G. Tesauro. Temporal difference learning and TD-gammon. Commun. ACM, 38(3):58–68, Mar. 1995.
  44. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484, 2016.
  45. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019. doi:10.1038/s41586-019-1724-z.
  46. Emergent complexity via multi-agent competition. In 6th International Conference on Learning Representations (ICLR), 2018.
  47. Emergent tool use from multi-agent autocurricula. In 8th International Conference on Learning Representations (ICLR), 2020, 2020.
  48. Emergent coordination through competition. ICLR, 2019.
  49. Open-ended learning in symmetric zero-sum games. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 434–443. PMLR, 2019. URL http://proceedings.mlr.press/v97/balduzzi19a.html.
  50. A unified game-theoretic approach to multiagent reinforcement learning. In Advances in neural information processing systems, pages 4190–4203, 2017.
  51. Fictitious self-play in extensive-form games. In F. R. Bach and D. M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 805–813. JMLR.org, 2015.
  52. G. W. Brown. Iterative solution of games by fictitious play. In T. C. Koopmans, editor, Activity Analysis of Production and Allocation. Wiley, New York, 1951.
  53. S. Thrun and A. Schwartz. Finding structure in reinforcement learning. Advances in neural information processing systems, 7, 1994.
  54. M. Bowling and M. Veloso. Reusing learned policies between similar problems. In Proceedings of the AI* AI-98 Workshop on New Trends in Robotics. Citeseer, 1998.
  55. Foundations for transfer in reinforcement learning: A taxonomy of knowledge modalities. ArXiv, abs/2312.01939, 2023. URL https://api.semanticscholar.org/CorpusID:265609417.
  56. Kickstarting deep reinforcement learning. ArXiv, abs/1803.03835, 2018.
  57. Information asymmetry in KL-regularized RL. In International Conference on Learning Representations, 2019.
  58. Learning transferable motor skills with hierarchical latent mixture policies. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
  59. Learning an embedding space for transferable robot skills. In International Conference on Learning Representations, 2018.
  60. OPAL: Offline primitive discovery for accelerating offline reinforcement learning. In International Conference on Learning Representations, 2021.
  61. Parrot: Data-Driven behavioral priors for reinforcement learning. In International Conference on Learning Representations, 2021.
  62. Behavior priors for efficient reinforcement learning. The Journal of Machine Learning Research, 23(1):9989–10056, 2022.
  63. Skills: Adaptive skill sequencing for efficient temporally-extended exploration. arXiv preprint arXiv:2211.13743, 2022.
  64. Data-efficient hindsight off-policy option learning. In International Conference on Machine Learning, pages 11340–11350. PMLR, 2021.
  65. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817, 2017.
  66. Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE international conference on robotics and automation (ICRA), pages 6292–6299. IEEE, 2018.
  67. Offline-to-online reinforcement learning via balanced replay and pessimistic q-ensemble. In Conference on Robot Learning, pages 1702–1712. PMLR, 2022.
  68. Don’t start from scratch: Leveraging prior data to automate robotic reinforcement learning. In Conference on Robot Learning, 2022.
  69. Cog: Connecting new skills to past experience with offline reinforcement learning. 2020.
  70. Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020.
  71. Critic regularized regression. Advances in Neural Information Processing Systems, 33, 2020.
  72. Keep doing what worked: Behavior modelling priors for offline reinforcement learning. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
  73. The challenges of exploration for offline reinforcement learning. ArXiv, abs/2201.11861, 2022.
  74. On multi-objective policy optimization as a tool for reinforcement learning. arXiv preprint arXiv:2106.08199, 2021.
  75. Iterative reinforcement learning based design of dynamic locomotion skills for Cassie. CoRR, abs/1903.09537, 2019. URL http://arxiv.org/abs/1903.09537.
  76. Agility Robotics. Cassie sets world record for 100m run, 2022. https://www.youtube.com/watch?v=DdojWYOK0Nc.
  77. Blind bipedal stair traversal via sim-to-real reinforcement learning. In D. A. Shell, M. Toussaint, and M. A. Hsieh, editors, Robotics: Science and Systems XVII, Virtual Event, July 12-16, 2021, 2021.
  78. Robust and versatile bipedal jumping control through multi-task reinforcement learning. CoRR, abs/2302.09450, 2023. doi:10.48550/arXiv.2302.09450.
  79. Towards real robot learning in the wild: A case study in bipedal locomotion. In Conference on Robot Learning, pages 1502–1511. PMLR, 2022.
  80. Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5307–5314. IEEE, 2015.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (16)
  1. Dhruva Tirumala (15 papers)
  2. Markus Wulfmeier (46 papers)
  3. Ben Moran (9 papers)
  4. Sandy Huang (5 papers)
  5. Jan Humplik (15 papers)
  6. Guy Lever (18 papers)
  7. Tuomas Haarnoja (16 papers)
  8. Leonard Hasenclever (33 papers)
  9. Arunkumar Byravan (27 papers)
  10. Nathan Batchelor (5 papers)
  11. Neil Sreendra (2 papers)
  12. Kushal Patel (9 papers)
  13. Marlon Gwira (2 papers)
  14. Francesco Nori (51 papers)
  15. Martin Riedmiller (64 papers)
  16. Nicolas Heess (139 papers)
Citations (5)

Summary

End-to-End Reinforcement Learning for Onboard Vision-Based Robot Soccer

Introduction

In an intriguing departure from traditional robot soccer approaches, researchers have developed a training pipeline that leverages multi-agent reinforcement learning (RL) to equip robots for soccer solely using their onboard sensors for vision. This means the robots are navigating the soccer field using nothing more than a camera fixed on their heads and some internal sensors like an IMU and joint encoders.

The Core of the Training Method

The key to achieving functionalities such as autonomous navigation and strategy execution in a dynamic environment lies in the innovative use of egocentric RGB vision. Here's a breakdown of the major components involved:

  • Simulation and Neural Radiance Fields (NeRFs): The team uses a sophisticated simulation environment where the physical soccer setting is closely replicated using NeRFs. This involves capturing hundreds of photos of the scene to create a realistic, static 3D model that can be observed from any viewpoint. Dynamic objects like the soccer ball and opponent robots are added to this static scene during simulation.
  • Reinforcement Learning Pipeline: Initially, separate 'expert' agents are trained for specific tasks like getting up from the ground or scoring against a stationary opponent. These experts are then distilled into a more general agent capable of handling the full complexity of a soccer match.
  • Memory and Active Vision: Instead of relying only on static views, agents use memory-driven strategies (via LSTM layers) to remember and anticipate the position of dynamic objects like the ball and opponents. Interestingly, this includes anticipating motion and future positions even when objects go out of sight temporarily.

How Do the Robots Perform?

The robots demonstrate impressive soccer skills:

  • Agility and Strategy Execution: They can walk, turn, and execute kicks with high agility, rivaling agents trained with more direct access to game state information.
  • Emergent Behavior: Without specific instructions to do so, robots start displaying behaviors like actively seeking the ball, positioning themselves strategically against opponents, and even blocking shots, purely as a function of the overarching goal to play soccer well.
  • Object Tracking and Active Vision: A striking finding is how well these robots can track the ball and gauge their relative position on the field using their onboard camera, despite the limitations imposed by a low-resolution input.

What Does This Mean for the Future of Robotics?

The implications of this research go beyond just playing soccer. It points towards broader possibilities in robotics applications where reliance on expensive or impractical external sensors is not feasible. For instance, search and rescue robots or autonomous delivery drones operating in complex, dynamically changing environments could benefit significantly from such technology.

Looking Ahead: Challenges and Opportunities

Despite the success, translating simulated training into real-world scenarios remains challenging. Differences such as lighting conditions, unexpected physical interactions (like a robot bumping into another), and hardware limitations (like camera quality or processing power) can reduce performance or require additional tweaks.

Conclusion

The journey of using end-to-end reinforcement learning for enabling robots to play soccer using only onboard vision is both challenging and fascinating. As research progresses, the integration of more realistic simulation techniques and advanced machine learning models promises even more robust and versatile robotic capabilities for real-world applications.

Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com