Emergent Mind

Generalization to New Sequential Decision Making Tasks with In-Context Learning

(2312.03801)
Published Dec 6, 2023 in cs.LG and cs.AI

Abstract

Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning. Recently, transformers have been shown to learn new language or vision tasks without any weight updates from only a few examples, also referred to as in-context learning. However, the sequential decision making setting poses additional challenges having a lower tolerance for errors since the environment's stochasticity or the agent's actions can lead to unseen, and sometimes unrecoverable, states. In this paper, we use an illustrative example to show that naively applying transformers to sequential decision making problems does not enable in-context learning of new tasks. We then demonstrate how training on sequences of trajectories with certain distributional properties leads to in-context learning of new sequential decision making tasks. We investigate different design choices and find that larger model and dataset sizes, as well as more task diversity, environment stochasticity, and trajectory burstiness, all result in better in-context learning of new out-of-distribution tasks. By training on large diverse offline datasets, our model is able to learn new MiniHack and Procgen tasks without any weight updates from just a handful of demonstrations.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Sign up for a free account or log in to generate a summary of this paper:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Is Conditional Generative Modeling all you need for Decision-Making?
  2. What learning algorithm is in-context learning? Investigations with linear models
  3. A Survey of Meta-Reinforcement Learning
  4. When does return-conditioned supervised learning work for offline reinforcement learning?
  5. Language models are few-shot learners
  6. Data distributional properties drive emergent in-context learning in transformers, 2022a
  7. Transformers generalize differently from information stored in context vs in weights
  8. TransDreamer: Reinforcement Learning with Transformer World Models
  9. Decision Transformer: Reinforcement Learning via Sequence Modeling
  10. Leveraging procedural generation to benchmark reinforcement learning
  11. Hierarchical Decision Transformer
  12. Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
  13. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning
  14. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR
  15. Predictability and surprise in large generative models. In 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, jun 2022. 10.1145/3531146.3533229. https://doi.org/10.11452F3531146.3533229.
  16. What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
  17. A Theory of Emergent In-Context Learning as Implicit Structure Induction
  18. Exploration via elliptical episodic bonuses. In Advances in Neural Information Processing Systems (NeurIPS)
  19. Decision Transformer under Random Frame Dropping
  20. Generalization in reinforcement learning with selective noise injection and information bottleneck. Advances in neural information processing systems, 32
  21. Task-embedded control networks for few-shot imitation learning. In Conference on robot learning, pages 783–795. PMLR
  22. Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning, pages 991–1002. PMLR
  23. Offline reinforcement learning as one big sequence modeling problem. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 1273–1286. Curran Associates, Inc., 2021a. https://proceedings.neurips.cc/paper_files/paper/2021/file/099fe6b0b444c23836c4a5d07346082b-Paper.pdf.

  24. Offline reinforcement learning as one big sequence modeling problem. In Advances in Neural Information Processing Systems, 2021b.
  25. Scaling Laws for Neural Language Models
  26. General-purpose in-context learning by meta-learning transformers
  27. Conservative q-learning for offline reinforcement learning
  28. The nethack learning environment. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
  29. In-context reinforcement learning with algorithm distillation
  30. Reinforcement learning with augmented data. Advances in neural information processing systems, 33:19884–19895
  31. Supervised pretraining can learn in-context reinforcement learning
  32. Multi-game decision transformers. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 27921–27936. Curran Associates, Inc., 2022. https://proceedings.neurips.cc/paper_files/paper/2022/file/b2cac94f82928a85055987d9fd44753f-Paper-Conference.pdf.

  33. Contextual transformer for offline reinforcement learning, 2023. https://openreview.net/forum?id=7pl0FRiS0Td.

  34. Structured state space models for in-context reinforcement learning, 2023a
  35. Structured state space models for in-context reinforcement learning, 2023b
  36. Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research, 61:523–562
  37. Towards more generalizable one-shot visual imitation learning. In 2022 International Conference on Robotics and Automation (ICRA), pages 2434–2444. IEEE
  38. Luckeciano C Melo. Transformers are meta-reinforcement learners. In International Conference on Machine Learning, pages 15340–15359. PMLR
  39. Offline meta-reinforcement learning with advantage weighting. In International Conference on Machine Learning, pages 7780–7791. PMLR
  40. In-context learning and induction heads, 2022a
  41. In-context Learning and Induction Heads
  42. You Can't Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments
  43. Offline meta-reinforcement learning with online self-supervision. In International Conference on Machine Learning, pages 17811–17829. PMLR
  44. Decoupling value and policy for generalization in reinforcement learning. In International Conference on Machine Learning, pages 8787–8798. PMLR
  45. Fast adaptation to new environments via policy-dynamics value functions. In International Conference on Machine Learning, pages 7920–7931. PMLR, 2020a.
  46. Automatic Data Augmentation for Generalization in Deep Reinforcement Learning
  47. A Generalist Agent
  48. Can Wikipedia Help Offline Reinforcement Learning?
  49. Backplay: "Man muss immer umkehren"
  50. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings
  51. Token Turing Machines
  52. Learning Montezuma's Revenge from a Single Demonstration
  53. Minihack the planet: A sandbox for open-ended reinforcement learning research. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021. https://openreview.net/forum?id=skFwlyefkWJ.

  54. Jürgen Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-… hook. PhD thesis, Technische Universität München
  55. A unified analysis of adagrad with weighted aggregation and momentum acceleration
  56. How Crucial is Transformer in Decision Transformer?
  57. Skill Decision Transformer
  58. SMART: Self-supervised Multi-task pretrAining with contRol Transformers
  59. Reinforcement learning: An introduction. MIT press
  60. Human-timescale adaptation in an open-ended task space, 2023a
  61. Human-timescale adaptation in an open-ended task space, 2023b
  62. Attention Is All You Need
  63. Attention is all you need, 2017b
  64. Transformers learn in-context by gradient descent
  65. Emergent abilities of large language models
  66. An Explanation of In-context Learning as Implicit Bayesian Inference
  67. Prompting decision transformer for few-shot policy generalization
  68. Hyper-Decision Transformer for Efficient Online Policy Adaptation
  69. Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL
  70. Online decision transformer. In International Conference on Machine Learning, pages 27042–27059. PMLR

Show All 70