Papers
Topics
Authors
Recent
2000 character limit reached

Procedural generation of meta-reinforcement learning tasks (2302.05583v2)

Published 11 Feb 2023 in cs.LG and cs.AI

Abstract: Open-endedness stands to benefit from the ability to generate an infinite variety of diverse, challenging environments. One particularly interesting type of challenge is meta-learning ("learning-to-learn"), a hallmark of intelligent behavior. However, the number of meta-learning environments in the literature is limited. Here we describe a parametrized space for simple meta-reinforcement learning (meta-RL) tasks with arbitrary stimuli. The parametrization allows us to randomly generate an arbitrary number of novel simple meta-learning tasks. The parametrization is expressive enough to include many well-known meta-RL tasks, such as bandit problems, the Harlow task, T-mazes, the Daw two-step task and others. Simple extensions allow it to capture tasks based on two-dimensional topological spaces, such as full mazes or find-the-spot domains. We describe a number of randomly generated meta-RL domains of varying complexity and discuss potential issues arising from random generation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Learning a synaptic learning rule. In International Joint Conference on Neural Networks (IJCNN-91), volume 2, pages 969–vol. IEEE.
  2. Rl22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Fast reinforcement learning via slow reinforcement learning.
  3. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, pages 1126–1135.
  4. Evolutionary robots with on-line self-organization and behavioral fitness. Neural Networks, 13(4-5):431–443.
  5. Schema formation in a neural population subspace underlies learning-to-learn in flexible sensorimotor problem-solving. Nature Neuroscience, 26(5):879–890.
  6. Harlow, H. F. (1949). The formation of learning sets. Psychol. Rev., 56(1):51–65.
  7. Meta-reinforcement learning via orbitofrontal cortex. Nature Neuroscience.
  8. Learning to learn using gradient descent. Artificial Neural Networks—ICANN 2001, pages 87–94.
  9. Discovering general reinforcement learning algorithms with adversarial environment design. In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023).
  10. In-context reinforcement learning with algorithm distillation. arXiv preprint arXiv:2210.14215.
  11. Miconi, T. (2016). Backpropagation of hebbian plasticity for continual learning. In NIPS Workshop on Continual Learning.
  12. Differentiable plasticity: training plastic networks with gradient descent. In Proceedings of the 35th International Conference on Machine Learning.
  13. History-dependent variability in population dynamics during evidence accumulation in cortex. Nature neuroscience, 19(12):1672–1681.
  14. Ruppin, E. (2002). Evolutionary autonomous agents: A neuroscience perspective. Nature Reviews Neuroscience, 3(2):132–141.
  15. Schmidhuber, J. (1993). Reducing the ratio between learning complexity and number of time varying variables in fully recurrent nets. In ICANN ’93: Proceedings of the International Conference on Artificial Neural Networks, pages 460–463.
  16. Evolutionary advantages of neuromodulated plasticity in dynamic, reward-based scenarios. In Proceedings of the 11th international conference on artificial life (Alife XI), number LIS-CONF-2008-012, pages 569–576. MIT Press.
  17. Human-timescale adaptation in an open-ended task space. arXiv preprint arXiv:2301.07608.
  18. Learning to learn: Introduction and overview. In Thrun, S. and Pratt, L., editors, Learning to Learn, pages 3–17. Kluwer Academic Publishers, Norwell, MA, USA.
  19. Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents. arXiv preprint arXiv:2102.02926.
  20. Prefrontal cortex as a meta-reinforcement learning system. Nature neuroscience, 21(6):860.
  21. Learning to reinforcement learn. arXiv e-prints, page arXiv:1611.05763.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.