Procedural generation of meta-reinforcement learning tasks (2302.05583v2)
Abstract: Open-endedness stands to benefit from the ability to generate an infinite variety of diverse, challenging environments. One particularly interesting type of challenge is meta-learning ("learning-to-learn"), a hallmark of intelligent behavior. However, the number of meta-learning environments in the literature is limited. Here we describe a parametrized space for simple meta-reinforcement learning (meta-RL) tasks with arbitrary stimuli. The parametrization allows us to randomly generate an arbitrary number of novel simple meta-learning tasks. The parametrization is expressive enough to include many well-known meta-RL tasks, such as bandit problems, the Harlow task, T-mazes, the Daw two-step task and others. Simple extensions allow it to capture tasks based on two-dimensional topological spaces, such as full mazes or find-the-spot domains. We describe a number of randomly generated meta-RL domains of varying complexity and discuss potential issues arising from random generation.
- Learning a synaptic learning rule. In International Joint Conference on Neural Networks (IJCNN-91), volume 2, pages 969–vol. IEEE.
- Rl22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Fast reinforcement learning via slow reinforcement learning.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, pages 1126–1135.
- Evolutionary robots with on-line self-organization and behavioral fitness. Neural Networks, 13(4-5):431–443.
- Schema formation in a neural population subspace underlies learning-to-learn in flexible sensorimotor problem-solving. Nature Neuroscience, 26(5):879–890.
- Harlow, H. F. (1949). The formation of learning sets. Psychol. Rev., 56(1):51–65.
- Meta-reinforcement learning via orbitofrontal cortex. Nature Neuroscience.
- Learning to learn using gradient descent. Artificial Neural Networks—ICANN 2001, pages 87–94.
- Discovering general reinforcement learning algorithms with adversarial environment design. In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023).
- In-context reinforcement learning with algorithm distillation. arXiv preprint arXiv:2210.14215.
- Miconi, T. (2016). Backpropagation of hebbian plasticity for continual learning. In NIPS Workshop on Continual Learning.
- Differentiable plasticity: training plastic networks with gradient descent. In Proceedings of the 35th International Conference on Machine Learning.
- History-dependent variability in population dynamics during evidence accumulation in cortex. Nature neuroscience, 19(12):1672–1681.
- Ruppin, E. (2002). Evolutionary autonomous agents: A neuroscience perspective. Nature Reviews Neuroscience, 3(2):132–141.
- Schmidhuber, J. (1993). Reducing the ratio between learning complexity and number of time varying variables in fully recurrent nets. In ICANN ’93: Proceedings of the International Conference on Artificial Neural Networks, pages 460–463.
- Evolutionary advantages of neuromodulated plasticity in dynamic, reward-based scenarios. In Proceedings of the 11th international conference on artificial life (Alife XI), number LIS-CONF-2008-012, pages 569–576. MIT Press.
- Human-timescale adaptation in an open-ended task space. arXiv preprint arXiv:2301.07608.
- Learning to learn: Introduction and overview. In Thrun, S. and Pratt, L., editors, Learning to Learn, pages 3–17. Kluwer Academic Publishers, Norwell, MA, USA.
- Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents. arXiv preprint arXiv:2102.02926.
- Prefrontal cortex as a meta-reinforcement learning system. Nature neuroscience, 21(6):860.
- Learning to reinforcement learn. arXiv e-prints, page arXiv:1611.05763.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.