Benchmarking General-Purpose In-Context Learning (2405.17234v6)
Abstract: In-context learning (ICL) empowers generative models to address new tasks effectively and efficiently on the fly, without relying on any artificially crafted optimization techniques. In this paper, we study extending ICL to address a broader range of tasks with an extended learning horizon and higher improvement potential, namely General Purpose In-Context Learning (GPICL). To this end, we introduce two lightweight benchmarks specifically crafted to train and evaluate GPICL functionalities. Each benchmark encompasses a vast number of tasks characterized by significant task variance. These tasks are also crafted to promote long-horizon in-context learning through continuous generation and interaction, covering domains such as LLMing, decision-making, and world modeling. The benchmarks necessitate the models to leverage contexts and history interactions to enhance their capabilities, which we believe to be the key characteristics of GPICL. Our experiments indicate that the diversity of training tasks is positively correlated with the ability to generalize with ICL, but inversely correlated with zero-shot capabilities. Additionally, our findings indicate that the scale of parameters alone may not be crucial for ICL or GPICL, suggesting alternative approaches such as increasing the scale of contexts and memory states.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Neural legal judgment prediction in english. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4317–4323, 2019.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
- Why can gpt learn in-context? language models implicitly perform gradient descent as meta-optimizers. In ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2023.
- Rl2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pp. 1126–1135. PMLR, 2017.
- Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pp. 630–645. Springer, 2016.
- Model-based imitation learning for urban driving. Advances in Neural Information Processing Systems, 35:20703–20716, 2022.
- Meta learning backpropagation and improving it. Advances in Neural Information Processing Systems, 34:14122–14134, 2021.
- General-purpose in-context learning by meta-learning transformers. arXiv preprint arXiv:2212.04458, 2022.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- Encoding innate ability through a genomic bottleneck. BiorXiv, pp. 2021–03, 2021.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Transformers as algorithms: Generalization and stability in in-context learning. In International Conference on Machine Learning, pp. 19565–19594. PMLR, 2023.
- Training language models to follow instructions with human feedback. Advances in neural information processing systems., 2022.
- Continual lifelong learning with neural networks: A review. Neural networks, 113:54–71, 2019.
- Compressive transformers for long-range sequence modelling. arXiv preprint arXiv:1911.05507, 2019.
- A generalist agent. 2022.
- Efficient reductions for imitation learning. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 661–668. JMLR Workshop and Conference Proceedings, 2010.
- A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 627–635. JMLR Workshop and Conference Proceedings, 2011.
- Meta-learning with memory-augmented neural networks. In International conference on machine learning, pp. 1842–1850. PMLR, 2016.
- Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568:127063, 2024.
- Long range arena: A benchmark for efficient transformers. In International Conference on Learning Representations, 2020.
- Efficient transformers: A survey. ACM Computing Surveys, 55(6):1–28, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Evolving decomposed plasticity rules for information-bottlenecked meta-learning. Transactions on Machine Learning Research, 2022.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pp. 1094–1100. PMLR, 2020.
- Anthony M Zador. A critique of pure learning and what artificial neural networks can learn from animal brains. Nature communications, 10(1):3770, 2019.