Papers
Topics
Authors
Recent
Search
2000 character limit reached

NovelGym: A Flexible Ecosystem for Hybrid Planning and Learning Agents Designed for Open Worlds

Published 7 Jan 2024 in cs.AI | (2401.03546v1)

Abstract: As AI agents leave the lab and venture into the real world as autonomous vehicles, delivery robots, and cooking robots, it is increasingly necessary to design and comprehensively evaluate algorithms that tackle the ``open-world''. To this end, we introduce NovelGym, a flexible and adaptable ecosystem designed to simulate gridworld environments, serving as a robust platform for benchmarking reinforcement learning (RL) and hybrid planning and learning agents in open-world contexts. The modular architecture of NovelGym facilitates rapid creation and modification of task environments, including multi-agent scenarios, with multiple environment transformations, thus providing a dynamic testbed for researchers to develop open-world AI agents.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Pddl— the planning domain definition language. Technical Report, Tech. Rep. (1998).
  2. Novgrid: A flexible grid world for evaluating agent response to novelty. arXiv preprint arXiv:2203.12117 (2022).
  3. Neuro-Symbolic World Models for Adapting to Open World Novelty. arXiv preprint arXiv:2301.06294 (2023).
  4. Weibull-Open-World (WOW) Multi-Type Novelty Detection in CartPole3D. Algorithms 15, 10 (2022), 381.
  5. Openai gym. arXiv preprint arXiv:1606.01540 (2016).
  6. Characterizing Novelty in the Military Domain. arXiv preprint arXiv:2302.12314 (2023).
  7. Robot task planning and situation handling in open worlds. arXiv preprint arXiv:2210.01287 (2022).
  8. RAPid-Learn: A Framework for Learning to Recover for Handling Novelties in Open-World Environments. In IEEE International Conference on Development and Learning, ICDL 2022, London, United Kingdom, September 12-15, 2022. IEEE, 15–22. https://doi.org/10.1109/ICDL53763.2022.9962230
  9. RAPid-Learn: A Framework for Learning to Recover for Handling Novelties in Open-World Environments.. In 2022 IEEE International Conference on Development and Learning (ICDL). IEEE, 15–22.
  10. Novelgridworlds: A benchmark environment for detecting and adapting to novelties in open worlds. In AAMAS Adaptive Learning Agents (ALA) Workshop.
  11. Jörg Hoffmann. 2003. The Metric-FF Planning System: Translating“Ignoring Delete Lists”to Numeric State Variables. Journal of artificial intelligence research 20 (2003), 291–341.
  12. Learning reward machines: A study in partially observable reinforcement learning. Artificial Intelligence 323 (2023), 103989.
  13. Towards continual reinforcement learning: A review and perspectives. Journal of Artificial Intelligence Research 75 (2022), 1401–1476.
  14. Model-Based Novelty Adaptation for Open-World AI. In International Workshop on Principles of Diagnosis (DX).
  15. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems 29 (2016).
  16. A novelty-centric agent architecture for changing worlds. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. 925–933.
  17. Fabio Pardo. 2020. Tonic: A deep reinforcement learning library for fast prototyping and benchmarking. arXiv preprint arXiv:2011.07537 (2020).
  18. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning. PMLR, 2778–2787.
  19. Spotter: Extending symbolic planning operators through targeted reinforcement learning. arXiv preprint arXiv:2012.13037 (2020).
  20. Vasanth Sarathy and Matthias Scheutz. 2018. MacGyver Problems: AI Challenges for Testing Resourcefulness and Creativity. Advances in Cognitive Systems 6 (2018). https://hrilab.tufts.edu/publications/sarathy2018MacGyverACS.pdf
  21. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
  22. Tom Silver and Rohan Chitnis. 2020. PDDLGym: Gym environments from PDDL problems. arXiv preprint arXiv:2002.06432 (2020).
  23. Model-Based Adaptation to Novelty for Open-World AI. In Proceedings of the ICAPS Workshop on Bridging the Gap Between AI Planning and Learning.
  24. Pettingzoo: Gym for multi-agent reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 15032–15043.
  25. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560 (2023).
  26. Tianshou: A Highly Modularized Deep Reinforcement Learning Library. Journal of Machine Learning Research 23, 267 (2022), 1–6. http://jmlr.org/papers/v23/21-1127.html
Citations (1)

Summary

  • The paper introduces NovelGym, a modular ecosystem that tests hybrid planning and learning agents using novelty injection to evaluate adaptability.
  • It utilizes a comprehensive evaluation protocol featuring metrics such as pre-novelty performance, adaptation duration, and post-adaptation efficiency across various novelty scenarios.
  • Experiments reveal that hybrid models adapt faster than pure reinforcement learning agents, highlighting potential for real-world autonomous applications.

NovelGym: A Flexible Ecosystem for Hybrid Planning and Learning Agents Designed for Open Worlds

Introduction

The paper "NovelGym: A Flexible Ecosystem for Hybrid Planning and Learning Agents Designed for Open Worlds" introduces NovelGym, an adaptable platform aimed at testing reinforcement learning (RL) and hybrid planning agents in open-world settings. The motivation behind developing NovelGym is the increasing demand for AI systems that can efficiently navigate and adapt within open-world environments, characterized by the presence of novel and unforeseen elements. As agents transition from closed laboratory settings to the real-world applications – such as autonomous vehicles and delivery robots – it becomes crucial to devise algorithms that can handle the complexities intrinsic to open-world scenarios, including recognizing and adapting to new concepts that were not part of their initial training.

NovelGym provides a modular architecture which enables the rapid development and testing of agents' ability to perceive and adapt to novel environmental conditions without requiring exhaustive pre-learning of every possible scenario. Figure 1

Figure 1: NovelGym environment representation. The figure shows a gridworld environment with various entities, as described in the legend. The red box highlights the novelty in the environment.

System Design and Environment

At its core, NovelGym is designed to simulate gridworld environments that can be adapted to test various novelty scenarios. The system comprises several key components, including dynamic environment modules and agent modules that allow for the seamless integration and interaction of agents within an environment that may contain both known and unknown elements. This modular setup is depicted in the system design of the NovelGym ecosystem. Figure 2

Figure 2: System design of the NovelGym ecosystem. Blue highlights the environment modules, and purple highlights the agent modules.

Agents in NovelGym are equipped with sensors, allowing them to generate observations and interact with the environment via a set of predefined actions. The environment supports multi-agent systems, ensuring that researchers can explore collaborative and competitive multi-agent dynamics. Novelty injection, a key feature of NovelGym, is facilitated by environment transformations, allowing researchers to inject or modify elements within the environment to simulate novel scenarios.

Evaluation Protocol and Metrics

NovelGym proposes a comprehensive protocol for evaluating novelty-aware agents. This involves an initial training phase in a stable, non-novel environment, followed by a novelty injection phase wherein environmental changes challenge the agent's adaptability. The process culminates in an adaptation phase where agents refine their strategies, and a final evaluation phase assesses how well agents have adapted.

Through a set of defined metrics – including pre-novelty performance, novelty impact, adaptation duration, post-adaptation efficiency, and others – NovelGym facilitates detailed analysis of an agent's ability to handle novelties. These metrics provide insights into the immediate impact of novelty on performance, the efficiency and success of adaptation strategies, and any potential long-term benefits or detriments. Figure 3

Figure 3: Illustration of performance metrics for open-world agents.

Experiments and Results

The paper details experiments across five key novelty scenarios within NovelGym, including both beneficial and detrimental novelties. Agents, implemented using various architectures including reinforcement learning with Intrinsic Curiosity Module (ICM), exhibit varied levels of adaptability and performance across the scenarios. The results demonstrate significant differences in how agents cope with novelties, with hybrid models often showing quicker adaptation compared to pure RL models.

NovelGym successfully highlights the challenges and opportunities in designing robust AI agents capable of operating in open-world settings. By capturing intricate details such as immediate impacts and long-term adaptation efficiency, the ecosystem provides a robust framework for future research.

Conclusion

NovelGym provides an essential platform for advancing research in open-world learning by offering robust tools to design, implement, and test agents in a controlled yet highly flexible environment. The implementation of NovelGym opens avenues for exploring the integration of planning and learning paradigms, procedural novelty generation, and more complex multi-agent interactions. This platform sets the stage for the development of agents capable of more efficiently dealing with real-world complexities, promoting more seamless interaction between autonomous systems and their dynamic environments. Future work will focus on leveraging the modular capabilities of NovelGym to investigate human-in-the-loop learning, the framework's expansion to encompass more intricate multi-agent dynamics, and the integration of continual learning processes.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.