Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evaluating Collaborative Autonomy in Opposed Environments using Maritime Capture-the-Flag Competitions (2404.17038v1)

Published 25 Apr 2024 in cs.RO

Abstract: The objective of this work is to evaluate multi-agent artificial intelligence methods when deployed on teams of unmanned surface vehicles (USV) in an adversarial environment. Autonomous agents were evaluated in real-world scenarios using the Aquaticus test-bed, which is a Capture-the-Flag (CTF) style competition involving teams of USV systems. Cooperative teaming algorithms of various foundations in behavior-based optimization and deep reinforcement learning (RL) were deployed on these USV systems in two versus two teams and tested against each other during a competition period in the fall of 2023. Deep reinforcement learning applied to USV agents was achieved via the Pyquaticus test bed, a lightweight gymnasium environment that allows simulated CTF training in a low-level environment. The results of the experiment demonstrate that rule-based cooperation for behavior-based agents outperformed those trained in Deep-reinforcement learning paradigms as implemented in these competitions. Further integration of the Pyquaticus gymnasium environment for RL with MOOS-IvP in terms of configuration and control schema will allow for more competitive CTF games in future studies. As the development of experimental deep RL methods continues, the authors expect that the competitive gap between behavior-based autonomy and deep RL will be reduced. As such, this report outlines the overall competition, methods, and results with an emphasis on future works such as reward shaping and sim-to-real methodologies and extending rule-based cooperation among agents to react to safety and security events in accordance with human experts intent/rules for executing safety and security processes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. A. Gupta, M. Novitzky, and M. Benjamin, “Learning autonomous marine behaviors in moos-ivp,” 2018, pp. 1–10.
  2. M. Novitzky, P. Robinette, M. R. Benjamin, C. Fitzgerald, and H. Schmidt, “Aquaticus: Publicly available datasets from a marine human-robot teaming testbed,” 2019, pp. 392–400.
  3. P. Newman, “Moos-mission oriented operating suite,” 2008.
  4. P. Spencer, P. Dasgupta, M. McCarrick, M. Novitzky, D. Hubczenko, S. Redfield, J. James, A. Jeffery, and R. Mittu, “Opposed artificial intelligence: Developing robustness to adversarial attacks in attacker-defender games via ai-based strategic game-playing,” 2021, pp. 18:1–18:12. [Online]. Available: https://www.sto.nato.int/publications/STO%20Meeting%20Proceedings/STO-MP-IST-190/MP-IST-190-18.pdf
  5. J. Kliem and P. Dasgupta, “Reward shaping for improved learning in real-time strategy game play,” 11 2023. [Online]. Available: http://arxiv.org/abs/2311.16339
  6. M. R. Benjamin, H. Schmidt, P. M. Newman, and J. J. Leonard, “Nested autonomy for unmanned marine vehicles with moos-ivp,” Journal of Field Robotics, vol. 27, pp. 834–875, 11 2010.
  7. M. Mann, P. Crowley, J. Kliem, P. Puma, and Z. Serlin, “Pyquaticus.” [Online]. Available: https://github.com/mit-ll-trusted-autonomy/pyquaticus
  8. E. Liang, R. Liaw, R. Nishihara, P. Moritz, R. Fox, K. Goldberg, J. Gonzalez, M. Jordan, and I. Stoica, “RLlib: Abstractions for distributed reinforcement learning,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80.   PMLR, 10–15 Jul 2018, pp. 3053–3062. [Online]. Available: https://proceedings.mlr.press/v80/liang18b.html
  9. J. K. Terry, B. Black, N. Grammel, M. Jayakumar, A. Hari, R. Sullivan, L. Santos, R. Perez, C. Horsch, C. Dieffendahl, N. L. Williams, Y. Lokesh, and P. Ravi, “Pettingzoo: A standard api for multi-agent reinforcement learning,” Advances in Neural Information Processing Systems, vol. 18, pp. 15 032–15 043, 2021.
  10. A. Hill, A. Raffin, M. Ernestus, A. Gleave, A. Kanervisto, R. Traore, P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, and Y. Wu, “Stable baselines,” https://github.com/hill-a/stable-baselines, 2018.
  11. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  12. S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in International conference on machine learning.   PMLR, 2018, pp. 1587–1596.
  13. R. S. Sutton, D. Precup, and S. Singh, “Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,” Artificial intelligence, vol. 112, no. 1-2, pp. 181–211, 1999.
  14. H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in Proceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1, 2016.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com