Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code (2405.15568v2)

Published 24 May 2024 in cs.AI
OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code

Abstract: Open-ended and AI-generating algorithms aim to continuously generate and solve increasingly complex tasks indefinitely, offering a promising path toward more general intelligence. To accomplish this grand vision, learning must occur within a vast array of potential tasks. Existing approaches to automatically generating environments are constrained within manually predefined, often narrow distributions of environment, limiting their ability to create any learning environment. To address this limitation, we introduce a novel framework, OMNI-EPIC, that augments previous work in Open-endedness via Models of human Notions of Interestingness (OMNI) with Environments Programmed in Code (EPIC). OMNI-EPIC leverages foundation models to autonomously generate code specifying the next learnable (i.e., not too easy or difficult for the agent's current skill set) and interesting (e.g., worthwhile and novel) tasks. OMNI-EPIC generates both environments (e.g., an obstacle course) and reward functions (e.g., progress through the obstacle course quickly without touching red objects), enabling it, in principle, to create any simulatable learning task. We showcase the explosive creativity of OMNI-EPIC, which continuously innovates to suggest new, interesting learning challenges. We also highlight how OMNI-EPIC can adapt to reinforcement learning agents' learning progress, generating tasks that are of suitable difficulty. Overall, OMNI-EPIC can endlessly create learnable and interesting environments, further propelling the development of self-improving AI systems and AI-Generating Algorithms. Project website with videos: https://dub.sh/omniepic

OMNI-EPIC: Open-endedness via Models of Human Notions of Interestingness with Environments Programmed in Code

The paper "OMNI-EPIC: Open-endedness via Models of Human Notions of Interestingness with Environments Programmed in Code" presents a novel framework designed for generating an endless stream of diverse and progressively challenging tasks for reinforcement learning (RL) agents. This approach leverages foundation models to automatically create a wide array of learning environments in code, addressing the limitations of previous methods constrained by narrow, predefined distributions of tasks.

Overview

Omni-EPIC stands out by combining the strengths of Open-endedness via Models of human Notions of Interestingness (OMNI) with Environments Programmed in Code (EPIC). Unlike previous models limited to narrow task spaces, OMNI-EPIC harnesses foundation models to generate code that specifies environments and reward functions dynamically. These environments are tailored to be both novel and solvable, progressively challenging the learning agents.

Methodology

The key components of OMNI-EPIC include:

  • Task Archive (Section 3.1): Maintains an expanding collection of successfully learned and failed tasks.
  • Task Generator (Section 3.2): Utilizes LLMs to create new, interesting tasks based on similarities to previously encountered tasks.
  • Environment Generator (Section 3.3): Converts task descriptions into executable code that defines the learning environment.
  • Model of Interestingness (Section 3.4): Evaluates the novelty and worthiness of generated tasks using LLMs.
  • Training Agents with RL (Section 3.5): Applies reinforcement learning to train agents within these generated environments.
  • Success Detector (Section 3.6): Automatically assesses task completion using LLMs or Vision-LLMs (VLMs).

Results

OMNI-EPIC's efficacy is demonstrated through two paradigms: a long run without agent training for rapid task generation and a short run with RL training to evaluate real learning progress.

  1. Long Run without Training:
    • Generated an expansive set of tasks, showing a diverse spectrum of challenges ranging from straightforward navigation to complex object interactions.
    • Figure 2 in the paper showcases the evolution of tasks over 200 iterations, highlighting the algorithm's ability to create novel and diverse challenges.
  2. Short Run with Training:
    • Demonstrated the generation of increasingly complex yet solvable tasks tailored to the agent's capabilities.
    • The RL agents trained on these tasks showed progressive learning, illustrating the effectiveness of OMNI-EPIC in creating a developmental curriculum.
    • Figure 3 provides visual evidence of the tasks and the respective trained agents.

Implications and Future Developments

OMNI-EPIC significantly extends the capabilities of AI-GAs (AI-generating algorithms) by demonstrating a scalable approach to open-ended environment generation. Notably, it advances towards achieving Darwin Completeness, the potential to generate any possible learning environment.

Practical Implications:

  • The ability to generate diverse learning environments can enhance the training of generalist AI agents capable of adapting to a wide range of tasks.

Theoretical Implications:

  • This approach highlights the importance of task diversity and the dynamic adaptation of learning environments for the development of general intelligence in AI systems.

Future Developments:

  • Future work could explore the use of increasingly sophisticated foundation models to further expand the capabilities of OMNI-EPIC.
  • Enhancing the success detector with more advanced VLMs could improve the accuracy and reliability of task completion assessments.
  • Developing methods for integrating this framework with real-world applications, such as robotics and virtual simulations, could also be an exciting avenue for research.

Conclusion

OMNI-EPIC represents a significant advancement in the creation of open-ended, automatically generated tasks for reinforcement learning. By integrating models of human notions of interestingness and environments programmed in code, this framework offers a robust approach to developing self-improving AI systems. The contributions of this work pave the way for future research towards achieving general intelligence and understanding the fundamental nature of creativity and learning in artificial agents.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Human-Timescale Adaptation in an Open-Ended Task Space. In Proceedings of the 40th International Conference on Machine Learning, pages 1887–1935. PMLR. ISSN: 2640-3498.
  2. Managing AI Risks in an Era of Rapid Progress. arXiv:2310.17688 [cs].
  3. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
  4. Quality-diversity through AI feedback. arXiv preprint arXiv:2310.13032.
  5. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs].
  6. Genie: Generative Interactive Environments. arXiv:2402.15391 [cs].
  7. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.
  8. Clune, J. (2020). AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence. arXiv:1905.10985 [cs].
  9. Pybullet, a python module for physics simulation for games, robotics and machine learning.
  10. Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design. In Advances in Neural Information Processing Systems, volume 33, pages 13049–13061. Curran Associates, Inc.
  11. Quality diversity through human feedback. arXiv preprint arXiv:2310.12103.
  12. Open Questions in Creating Safe Open-ended AI: Tensions Between Control and Creativity. arXiv:2006.07495 [cs].
  13. Mastering Diverse Domains through World Models. arXiv:2301.04104 [cs, stat].
  14. Emergence of Locomotion Behaviours in Rich Environments. arXiv:1707.02286 [cs].
  15. Introduction to automata theory, languages, and computation. Acm Sigact News, 32(1):60–65.
  16. Perceiver IO: A General Architecture for Structured Inputs & Outputs. arXiv:2107.14795 [cs, eess].
  17. Prioritized Level Replay. In Proceedings of the 38th International Conference on Machine Learning, pages 4940–4950. PMLR. ISSN: 2640-3498.
  18. General Intelligence Requires Rethinking Exploration. arXiv:2211.07819 [cs].
  19. Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft. arXiv preprint arXiv:2106.14876.
  20. Specification gaming: the flip side of AI ingenuity. https://deepmind.google/discover/blog/specification-gaming-the-flip-side-of-ai-ingenuity/.
  21. Evolution Through Large Models. In Banzhaf, W., Machado, P., and Zhang, M., editors, Handbook of Evolutionary Machine Learning, pages 331–366. Springer Nature Singapore, Singapore.
  22. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  23. Large Language Models as In-context AI Generators for Quality-Diversity. arXiv preprint arXiv:2404.15794.
  24. Eureka: Human-Level Reward Design via Coding Large Language Models.
  25. OpenAI (2024). Text embedding 3 small. https://platform.openai.com/docs/guides/embeddings/embedding-models. Accessed: 17 May 2024.
  26. Solving Rubik’s Cube with a Robot Hand. arXiv:1910.07113 [cs, stat].
  27. Evolving Curricula with Regret-Based Environment Design. In Proceedings of the 39th International Conference on Machine Learning, pages 17473–17498. PMLR. ISSN: 2640-3498.
  28. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
  29. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  30. MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning.
  31. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489. Number: 7587 Publisher: Nature Publishing Group.
  32. Why greatness cannot be planned: The myth of the objective. Springer.
  33. Open-endedness: The last grand challenge you’ve never heard of. While open-endedness could be a force for discovering intelligence, it could also be a component of AI itself.
  34. MarioGPT: Open-Ended Text2Level Generation through Large Language Models.
  35. Reinforcement learning: an introduction. Adaptive computation and machine learning series. The MIT Press, Cambridge, Massachusetts, second edition edition.
  36. Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–30. IEEE.
  37. Level Generation Through Large Language Models. In Proceedings of the 18th International Conference on the Foundations of Digital Games, FDG ’23, pages 1–8, New York, NY, USA. Association for Computing Machinery.
  38. Gymnasium.
  39. Visualizing data using t-SNE. Journal of machine learning research, 9(11).
  40. Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  41. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.
  42. GenSim: Generating Robotic Simulation Tasks via Large Language Models.
  43. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions. arXiv:1901.01753 [cs].
  44. Enhanced POET: Open-ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions. In Proceedings of the 37th International Conference on Machine Learning, pages 9940–9951. PMLR. ISSN: 2640-3498.
  45. RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation. arXiv:2311.01455 [cs].
  46. EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents. arXiv:2403.12014 [cs].
  47. OMNI: Open-endedness via Models of human Notions of Interestingness.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Maxence Faldor (11 papers)
  2. Jenny Zhang (10 papers)
  3. Antoine Cully (68 papers)
  4. Jeff Clune (65 papers)
Citations (4)
Youtube Logo Streamline Icon: https://streamlinehq.com