Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalization to New Sequential Decision Making Tasks with In-Context Learning (2312.03801v1)

Published 6 Dec 2023 in cs.LG and cs.AI

Abstract: Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning. Recently, transformers have been shown to learn new language or vision tasks without any weight updates from only a few examples, also referred to as in-context learning. However, the sequential decision making setting poses additional challenges having a lower tolerance for errors since the environment's stochasticity or the agent's actions can lead to unseen, and sometimes unrecoverable, states. In this paper, we use an illustrative example to show that naively applying transformers to sequential decision making problems does not enable in-context learning of new tasks. We then demonstrate how training on sequences of trajectories with certain distributional properties leads to in-context learning of new sequential decision making tasks. We investigate different design choices and find that larger model and dataset sizes, as well as more task diversity, environment stochasticity, and trajectory burstiness, all result in better in-context learning of new out-of-distribution tasks. By training on large diverse offline datasets, our model is able to learn new MiniHack and Procgen tasks without any weight updates from just a handful of demonstrations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Is conditional generative modeling all you need for decision-making? arXiv preprint arXiv:2211.15657, 2022.
  2. What learning algorithm is in-context learning? investigations with linear models. arXiv preprint arXiv:2211.15661, 2022.
  3. A survey of meta-reinforcement learning. arXiv preprint arXiv:2301.08028, 2023.
  4. When does return-conditioned supervised learning work for offline reinforcement learning? arXiv preprint arXiv:2206.01079, 2022.
  5. Language models are few-shot learners, 2020.
  6. Data distributional properties drive emergent in-context learning in transformers, 2022a.
  7. Transformers generalize differently from information stored in context vs in weights. arXiv preprint arXiv:2210.05675, 2022b.
  8. Transdreamer: Reinforcement learning with transformer world models. arXiv preprint arXiv:2202.09481, 2022.
  9. Decision transformer: Reinforcement learning via sequence modeling. CoRR, abs/2106.01345, 2021. https://arxiv.org/abs/2106.01345.
  10. Leveraging procedural generation to benchmark reinforcement learning, 2020.
  11. Hierarchical decision transformer. arXiv preprint arXiv:2209.10447, 2022.
  12. Why can gpt learn in-context? language models secretly perform gradient descent as meta optimizers. arXiv preprint arXiv:2212.10559, 2022.
  13. Rl$^2$: Fast reinforcement learning via slow reinforcement learning. CoRR, abs/1611.02779, 2016. http://arxiv.org/abs/1611.02779.
  14. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
  15. Predictability and surprise in large generative models. In 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, jun 2022. 10.1145/3531146.3533229. https://doi.org/10.11452F3531146.3533229.
  16. What can transformers learn in-context? A case study of simple function classes. CoRR, abs/2208.01066, 2022. 10.48550/arXiv.2208.01066. https://doi.org/10.48550/arXiv.2208.01066.
  17. A theory of emergent in-context learning as implicit structure induction. arXiv preprint arXiv:2303.07971, 2023.
  18. Exploration via elliptical episodic bonuses. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  19. Decision transformer under random frame dropping. arXiv preprint arXiv:2303.03391, 2023.
  20. Generalization in reinforcement learning with selective noise injection and information bottleneck. Advances in neural information processing systems, 32, 2019.
  21. Task-embedded control networks for few-shot imitation learning. In Conference on robot learning, pages 783–795. PMLR, 2018.
  22. Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning, pages 991–1002. PMLR, 2022.
  23. Offline reinforcement learning as one big sequence modeling problem. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 1273–1286. Curran Associates, Inc., 2021a. https://proceedings.neurips.cc/paper_files/paper/2021/file/099fe6b0b444c23836c4a5d07346082b-Paper.pdf.
  24. Offline reinforcement learning as one big sequence modeling problem. In Advances in Neural Information Processing Systems, 2021b.
  25. Scaling laws for neural language models. CoRR, abs/2001.08361, 2020. https://arxiv.org/abs/2001.08361.
  26. General-purpose in-context learning by meta-learning transformers, 2022.
  27. Conservative q-learning for offline reinforcement learning, 2020.
  28. The nethack learning environment. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
  29. In-context reinforcement learning with algorithm distillation, 2022.
  30. Reinforcement learning with augmented data. Advances in neural information processing systems, 33:19884–19895, 2020.
  31. Supervised pretraining can learn in-context reinforcement learning, 2023.
  32. Multi-game decision transformers. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 27921–27936. Curran Associates, Inc., 2022. https://proceedings.neurips.cc/paper_files/paper/2022/file/b2cac94f82928a85055987d9fd44753f-Paper-Conference.pdf.
  33. Contextual transformer for offline reinforcement learning, 2023. https://openreview.net/forum?id=7pl0FRiS0Td.
  34. Structured state space models for in-context reinforcement learning, 2023a.
  35. Structured state space models for in-context reinforcement learning, 2023b.
  36. Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research, 61:523–562, 2018.
  37. Towards more generalizable one-shot visual imitation learning. In 2022 International Conference on Robotics and Automation (ICRA), pages 2434–2444. IEEE, 2022.
  38. Luckeciano C Melo. Transformers are meta-reinforcement learners. In International Conference on Machine Learning, pages 15340–15359. PMLR, 2022.
  39. Offline meta-reinforcement learning with advantage weighting. In International Conference on Machine Learning, pages 7780–7791. PMLR, 2021.
  40. In-context learning and induction heads, 2022a.
  41. In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022b.
  42. You can’t count on luck: Why decision transformers fail in stochastic environments. arXiv preprint arXiv:2205.15967, 2022.
  43. Offline meta-reinforcement learning with online self-supervision. In International Conference on Machine Learning, pages 17811–17829. PMLR, 2022.
  44. Decoupling value and policy for generalization in reinforcement learning. In International Conference on Machine Learning, pages 8787–8798. PMLR, 2021.
  45. Fast adaptation to new environments via policy-dynamics value functions. In International Conference on Machine Learning, pages 7920–7931. PMLR, 2020a.
  46. Automatic data augmentation for generalization in deep reinforcement learning. ArXiv, abs/2006.12862, 2020b. https://api.semanticscholar.org/CorpusID:219980275.
  47. A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
  48. Can wikipedia help offline reinforcement learning? arXiv preprint arXiv:2201.12122, 2022.
  49. Backplay:" man muss immer umkehren". arXiv preprint arXiv:1807.06919, 2018.
  50. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011.
  51. Token turing machines. arXiv preprint arXiv:2211.09119, 2022.
  52. Learning montezuma’s revenge from a single demonstration. arXiv preprint arXiv:1812.03381, 2018.
  53. Minihack the planet: A sandbox for open-ended reinforcement learning research. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021. https://openreview.net/forum?id=skFwlyefkWJ.
  54. Jürgen Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-… hook. PhD thesis, Technische Universität München, 1987.
  55. A unified analysis of adagrad with weighted aggregation and momentum acceleration, 2023.
  56. How crucial is transformer in decision transformer? arXiv preprint arXiv:2211.14655, 2022.
  57. Skill decision transformer. arXiv preprint arXiv:2301.13573, 2023.
  58. Smart: Self-supervised multi-task pretraining with control transformers. arXiv preprint arXiv:2301.09816, 2023.
  59. Reinforcement learning: An introduction. MIT press, 2018.
  60. Human-timescale adaptation in an open-ended task space, 2023a.
  61. Human-timescale adaptation in an open-ended task space, 2023b.
  62. Attention is all you need. CoRR, abs/1706.03762, 2017a. http://arxiv.org/abs/1706.03762.
  63. Attention is all you need, 2017b.
  64. Transformers learn in-context by gradient descent. arXiv preprint arXiv:2212.07677, 2022.
  65. Emergent abilities of large language models, 2022.
  66. An explanation of in-context learning as implicit bayesian inference. arXiv preprint arXiv:2111.02080, 2021.
  67. Prompting decision transformer for few-shot policy generalization, 2022.
  68. Hyper-decision transformer for efficient online policy adaptation. arXiv preprint arXiv:2304.08487, 2023.
  69. Q-learning decision transformer: Leveraging dynamic programming for conditional sequence modelling in offline rl. arXiv preprint arXiv:2209.03993, 2022.
  70. Online decision transformer. In International Conference on Machine Learning, pages 27042–27059. PMLR, 2022.
Citations (16)

Summary

We haven't generated a summary for this paper yet.