Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 43 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 88 tok/s Pro
Kimi K2 182 tok/s Pro
GPT OSS 120B 415 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

SplAgger: Split Aggregation for Meta-Reinforcement Learning (2403.03020v3)

Published 5 Mar 2024 in cs.LG and cs.AI

Abstract: A core ambition of reinforcement learning (RL) is the creation of agents capable of rapid learning in novel tasks. Meta-RL aims to achieve this by directly learning such agents. Black box methods do so by training off-the-shelf sequence models end-to-end. By contrast, task inference methods explicitly infer a posterior distribution over the unknown task, typically using distinct objectives and sequence models designed to enable task inference. Recent work has shown that task inference methods are not necessary for strong performance. However, it remains unclear whether task inference sequence models are beneficial even when task inference objectives are not. In this paper, we present evidence that task inference sequence models are indeed still beneficial. In particular, we investigate sequence models with permutation invariant aggregation, which exploit the fact that, due to the Markov property, the task posterior does not depend on the order of data. We empirically confirm the advantage of permutation invariant sequence models without the use of task inference objectives. However, we also find, surprisingly, that there are multiple conditions under which permutation variance remains useful. Therefore, we propose SplAgger, which uses both permutation variant and invariant components to achieve the best of both worlds, outperforming all baselines evaluated on continuous control and memory environments. Code is provided at https://github.com/jacooba/hyper.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Amrl: Aggregated memory for reinforcement learning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=Bkl7bREtDr.
  2. Hypernetworks in meta-reinforcement learning. CoRL, 2022.
  3. A survey of meta-reinforcement learning. arXiv preprint arXiv:2301.08028, 2023a.
  4. Recurrent hypernetworks are surprisingly strong in meta-rl. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b. URL https://openreview.net/forum?id=pefAAzu8an.
  5. Language models are few-shot learners. In Advances in Neural Information Processing Systems, 2020.
  6. Learning phrase representations using rnn encoder-decoder for statistical machine translation. EMNLP, 2014.
  7. Palm: Scaling language modeling with pathways, 2022.
  8. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019.
  9. Rl22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.
  10. Generalization of reinforcement learners with working and episodic memory. NeurIPS, 2019.
  11. Meta-learning surrogate models for sequential decision making, 2019.
  12. Conditional neural processes. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp.  1704–1713, 2018a.
  13. Neural processes. ICML, 2018b.
  14. Hypernetworks. In International Conference on Learning Representation (ICLR), 2017.
  15. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  16. Meta reinforcement learning as task inference. arXiv, 2019.
  17. Off-policy meta-reinforcement learning with belief-based task inference. IEEE Access, 10:49494–49507, 2022.
  18. Transformers are rnns: Fast autoregressive transformers with linear attention. ICML, 2020.
  19. General-purpose in-context learning by meta-learning transformers. arXiv, 2022.
  20. Supervised pretraining can learn in-context reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
  21. A simple neural attentive meta-learner. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=B1DmUzWAW.
  22. Kevin P. Murphy. Conjugate bayesian analysis of the gaussian distribution. Technical report, 2007.
  23. Transformer neural processes: Uncertainty-aware meta learning via sequence modeling, 2022.
  24. Recurrent model-free RL can be a strong baseline for many POMDPs. In Proceedings of the 39th International Conference on Machine Learning, 2022.
  25. Fast adaptation via policy-dynamics value functions. ICML, 2020.
  26. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In International conference on machine learning, pp.  5331–5340. PMLR, 2019.
  27. Generalization to new sequential decision making tasks with in-context learning. NeurIPS 2023 Workshop on Foundation Models for Decision Making, 2023.
  28. Rapid task-solving in novel environments. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=F-mvpFpn_0q.
  29. Approximate information state for approximate planning and reinforcement learning in partially observed systems. The Journal of Machine Learning Research, 23(1):483–565, 2022.
  30. Human-timescale adaptation in an open-ended task space. arXiv preprint arXiv:2301.07608, 2023.
  31. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.
  32. Learning expressive meta-representations with mixture of expert neural processes. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=ju38DG3sbg6.
  33. Varibad: Variational bayes-adaptive deep rl via meta-learning. Journal of Machine Learning Research, 22(289):1–39, 2021.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.