Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OMNI: Open-endedness via Models of human Notions of Interestingness (2306.01711v3)

Published 2 Jun 2023 in cs.AI and cs.LG
OMNI: Open-endedness via Models of human Notions of Interestingness

Abstract: Open-ended algorithms aim to learn new, interesting behaviors forever. That requires a vast environment search space, but there are thus infinitely many possible tasks. Even after filtering for tasks the current agent can learn (i.e., learning progress), countless learnable yet uninteresting tasks remain (e.g., minor variations of previously learned tasks). An Achilles Heel of open-endedness research is the inability to quantify (and thus prioritize) tasks that are not just learnable, but also $\textit{interesting}$ (e.g., worthwhile and novel). We propose solving this problem by $\textit{Open-endedness via Models of human Notions of Interestingness}$ (OMNI). The insight is that we can utilize foundation models (FMs) as a model of interestingness (MoI), because they $\textit{already}$ internalize human concepts of interestingness from training on vast amounts of human-generated data, where humans naturally write about what they find interesting or boring. We show that FM-based MoIs improve open-ended learning by focusing on tasks that are both learnable $\textit{and interesting}$, outperforming baselines based on uniform task sampling or learning progress alone. This approach has the potential to dramatically advance the ability to intelligently select which tasks to focus on next (i.e., auto-curricula), and could be seen as AI selecting its own next task to learn, facilitating self-improving AI and AI-Generating Algorithms. Project website at https://www.jennyzhangzt.com/omni/

Overview of "OMNI: Open-endedness via Models of human Notions of Interestingness"

The paper "OMNI: Open-endedness via Models of human Notions of Interestingness" introduces a novel approach to address a key challenge in the domain of open-ended learning algorithms—specifically, the need to quantify and prioritize tasks based on their learnability and interestingness. The inability to effectively target tasks that are not just learnable but also intrinsically interesting has been a significant hurdle in advancing open-ended learning systems.

Introduction to Open-Endedness via OMNI

Open-ended algorithms aim to continuously explore and learn novel behaviors. This requires navigating vast task spaces, which inherently contain an infinite number of possible tasks, posing a challenge known as the "Achilles Heel" of open-endedness. While previous approaches have focused on learning progress, they often succumb to exploring trivial or repetitive tasks, leading to inefficiencies.

The approach proposed in this paper, termed "Open-endedness via Models of human Notions of Interestingness" (OMNI), leverages foundation models (FMs) trained on large datasets of human-generated content. These models inherently understand human concepts of interestingness by virtue of their training. The authors advocate using these models to guide the selection of tasks based on learnability and interestingness, potentially advancing the self-improvement capabilities of AI through auto-curricula.

Methodology and Implementation

The OMNI methodology involves two primary elements: the Learning Progress (LP) curriculum and the Model of Interestingness (MoI) devised using foundation models.

  1. Learning Progress Curriculum: This curriculum biases task selection towards tasks at the frontier of the agent’s capabilities by measuring bidirectional learning progress. This involves normalizing current task success rates and tracking changes over time, thereby focusing on tasks that exhibit the most meaningful progress for the agent’s learning trajectory.
  2. Model of Interestingness (MoI): The core innovation lies in using foundation models to predict which tasks are interesting—defined as novel and worthwhile in human terms. By consulting these models, OMNI focuses on tasks with high learning progress that are also interesting, effectively filtering out uninteresting, redundant challenges.

Experiments and Results

Experiments were conducted across various environments, including Crafter and BabyAI, demonstrating OMNI's capability to outperform baseline methods both in terms of average task success rates and the number of tasks learned. Interestingly, OMNI closely aligns its task performance to an oracle that perfectly discerns task interestingness, validating the approach's effectiveness.

Further, the paper extends these experiments to infinite task spaces, exemplified by the AI2-THOR domain, where it shows that OMNI not only generates learnable and interesting tasks but also facilitates the definition of reward functions through FM-generated code, marking a novel step in open-ended environments.

Implications and Future Prospects

The implications of OMNI are significant both practically and theoretically. Practically, this method provides a new pathway for developing AI systems capable of identifying and pursuing meaningful tasks without human intervention, potentially steering AI towards self-improving mechanisms. Theoretically, OMNI offers an innovative perspective on modeling human-like curiosity and novel task exploration within AI, which may lead to broader applications beyond current open-ended setups.

The paper suggests future directions, such as incorporating multi-modal models for richer task representations and exploring human feedback systems akin to Reinforcement Learning with Human Feedback (RLHF) to further refine the Model of Interestingness. Such developments could be pivotal in addressing the safety challenges inherent to open-ended systems by aligning them closer to human values and ensuring beneficial AI trajectories.

Conclusion

Overall, this paper makes a significant contribution to the field of open-ended learning by leveraging foundation models to emulate human notions of interestingness in task selection. By addressing the challenges of infinite task spaces and intelligently prioritizing tasks, OMNI stands as a promising framework that enhances the capacity of AI systems to engage in perpetual, intriguing, and relevant learning. This represents a step towards not only more effective AI-generating algorithms but also towards AI systems that can autonomously drive meaningful innovation and discovery.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
  2. Solving rubik’s cube with a robot hand. arXiv preprint arXiv:1910.07113, 2019.
  3. A survey on intrinsic motivation in reinforcement learning. arXiv preprint arXiv:1908.06976, 2019.
  4. Evolving CPPNs to grow three-dimensional physical structures. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, pp.  627–634, 2010.
  5. Active learning of inverse models with intrinsically motivated goal exploration in robots. Robotics and Autonomous Systems, 61(1):49–73, 2013.
  6. Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems, 29, 2016.
  7. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pp.  41–48, 2009.
  8. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  9. Nick Bostrom. Existential risks: Analyzing human extinction scenarios and related hazards. Journal of Evolution and technology, 9, 2002.
  10. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  11. Learning with amigo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122, 2020.
  12. Babyai: A platform to study the sample efficiency of grounded language learning. arXiv preprint arXiv:1810.08272, 2018.
  13. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
  14. Jeff Clune. AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence. arXiv preprint arXiv:1905.10985, 2019.
  15. Language as a cognitive tool to imagine goals in curiosity driven exploration. Advances in Neural Information Processing Systems, 33:3761–3774, 2020.
  16. Language and culture internalization for human-like autotelic AI. Nature Machine Intelligence, 4(12):1068–1076, 2022a.
  17. Autotelic agents with intrinsically motivated goal-conditioned reinforcement learning: a short survey. Journal of Artificial Intelligence Research, 74:1159–1199, 2022b.
  18. Augmenting autotelic agents with large language models. arXiv preprint arXiv:2305.12487, 2023.
  19. AI research considerations for human existential safety (ARCHES). arXiv preprint arXiv:2006.04948, 2020.
  20. Emergent complexity and zero-shot transfer via unsupervised environment design. Advances in neural information processing systems, 33:13049–13061, 2020.
  21. Gate-variants of gated recurrent unit (gru) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pp.  1597–1600. IEEE, 2017.
  22. Transfer dynamics in emergent evolutionary curricula. IEEE Transactions on Games, 2022.
  23. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
  24. Guiding Pretraining in Reinforcement Learning with Large Language Models. arXiv preprint arXiv:2302.06692, 2023.
  25. Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995, 2019.
  26. Open Questions in Creating Safe Open-ended AI: Tensions Between Control and Creativity. In ALIFE 2020: The 2020 Conference on Artificial Life, pp.  27–35. MIT Press, 2020.
  27. First return, then explore. Nature, 590(7847):580–586, 2021.
  28. Hierarchically organized latent modules for exploratory search in morphogenetic systems. Advances in Neural Information Processing Systems, 33:4846–4859, 2020.
  29. Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
  30. Minedojo: Building open-ended embodied agents with internet-scale knowledge. arXiv preprint arXiv:2206.08853, 2022.
  31. Automatic goal generation for reinforcement learning agents. In International conference on machine learning, pp.  1515–1528. PMLR, 2018.
  32. Automated curriculum learning for neural networks. In international conference on machine learning, pp.  1311–1320. PMLR, 2017.
  33. Adversarial environment generation for learning to navigate the web. arXiv preprint arXiv:2103.01991, 2021.
  34. Danijar Hafner. Benchmarking the spectrum of agent capabilities. arXiv preprint arXiv:2109.06780, 2021.
  35. Grounded language learning in a simulated 3d world. arXiv preprint arXiv:1706.06551, 2017.
  36. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pp.  9118–9147. PMLR, 2022a.
  37. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022b.
  38. Prioritized level replay. In International Conference on Machine Learning, pp.  4940–4950. PMLR, 2021.
  39. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134, 1998.
  40. Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft. arXiv preprint arXiv:2106.14876, 2021.
  41. Housekeep: Tidying virtual households using commonsense reasoning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIX, pp.  355–373. Springer, 2022.
  42. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  43. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, pp.  2, 2019.
  44. Grimgep: learning progress for robust goal sampling in visual deep reinforcement learning. IEEE Transactions on Cognitive and Developmental Systems, 99:1–1, 2022.
  45. Reward design with language models. arXiv preprint arXiv:2303.00001, 2023.
  46. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation, 19(2):189–223, 2011a.
  47. Evolving a diversity of virtual creatures through novelty search and local competition. In Proceedings of the 13th annual conference on Genetic and evolutionary computation, pp.  211–218, 2011b.
  48. Beyond open-endedness: Quantifying impressiveness. In ALIFE 2012: The Thirteenth International Conference on the Synthesis and Simulation of Living Systems, pp.  75–82. MIT Press, 2012.
  49. The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities. Artificial life, 26(2):274–306, 2020.
  50. A systematic investigation of commonsense knowledge in large language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  11838–11855, 2022.
  51. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  52. Language conditioned imitation learning over unstructured data. arXiv preprint arXiv:2005.07648, 2020.
  53. Teacher–student curriculum learning. IEEE transactions on neural networks and learning systems, 31(9):3732–3740, 2019.
  54. Alan: Autonomously exploring robotic agents in the real world. arXiv preprint arXiv:2302.06604, 2023.
  55. Recent advances in natural language processing via large pre-trained language models: A survey. arXiv preprint arXiv:2111.01243, 2021.
  56. Jean-Baptiste Mouret. Novelty-based multiobjectivization. In New Horizons in Evolutionary Robotics: Extended Contributions from the 2009 EvoDeRob Workshop, pp.  139–154. Springer, 2011.
  57. Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909, 2015.
  58. Innovation engines: Automated creativity and improved stochastic optimization via deep learning. In Proceedings of the 2015 annual conference on genetic and evolutionary computation, pp.  959–966, 2015.
  59. OpenAI. GPT-4 Technical Report. ArXiv, abs/2303.08774, 2023.
  60. Asymmetric self-play for automatic goal discovery in robotic manipulation. arXiv preprint arXiv:2101.04882, 2021.
  61. Randomized prior functions for deep reinforcement learning. Advances in Neural Information Processing Systems, 31, 2018.
  62. Intrinsic motivation systems for autonomous mental development. IEEE transactions on evolutionary computation, 11(2):265–286, 2007.
  63. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pp.  2778–2787. PMLR, 2017.
  64. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  65. Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments. In Conference on Robot Learning, pp.  835–853. PMLR, 2020.
  66. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  67. Zero-shot text-to-image generation. In International Conference on Machine Learning, pp.  8821–8831. PMLR, 2021.
  68. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015.
  69. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  70. Skill induction and planning with latent language. arXiv preprint arXiv:2110.01517, 2021.
  71. Why Greatness Cannot Be Planned: The Myth of the Objective. Springer Publishing Company, Incorporated, 2015. ISBN 3319155237.
  72. Open-endedness: The Last Grand Challenge You’ve Never Heard Of. https://www.oreilly.com/radar/open-endedness-the-last-grand-challenge-youve-never-heard-of/, May 2023. Accessed: 2023-05-15.
  73. Potter Stewart. Jacobellis v. Ohio, 378 U.S. 184. United States Supreme Court, 1964.
  74. Marilyn Strathern. ‘Improving ratings’: audit in the British University system. European review, 5(3):305–321, 1997.
  75. Perceptive Locomotion with Controllable Pace and Natural Gait Transitions Over Uneven Terrains. arXiv preprint arXiv:2301.10894, 2023.
  76. Human-Timescale Adaptation in an Open-Ended Task Space. arXiv preprint arXiv:2301.07608, 2023.
  77. Classification of global catastrophic risks connected with artificial intelligence. AI & Society, 35(1):147–163, 2020.
  78. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
  79. Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv:1901.01753, 2019.
  80. Enhanced poet: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In International Conference on Machine Learning, pp.  9940–9951. PMLR, 2020.
  81. Foundation models for decision making: Problems, methods, and opportunities. arXiv preprint arXiv:2303.04129, 2023.
  82. Automatic curriculum learning through value disagreement. Advances in Neural Information Processing Systems, 33:7648–7659, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jenny Zhang (10 papers)
  2. Joel Lehman (34 papers)
  3. Kenneth Stanley (2 papers)
  4. Jeff Clune (65 papers)
Citations (19)
Youtube Logo Streamline Icon: https://streamlinehq.com