Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 92 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 32 tok/s
GPT-5 High 36 tok/s Pro
GPT-4o 88 tok/s
GPT OSS 120B 471 tok/s Pro
Kimi K2 220 tok/s Pro
2000 character limit reached

How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes (2404.03558v1)

Published 4 Apr 2024 in cs.CL and cs.LG

Abstract: LLMs (LLM) have recently shown the extraordinary ability to perform unseen tasks based on few-shot examples provided as text, also known as in-context learning (ICL). While recent works have attempted to understand the mechanisms driving ICL, few have explored training strategies that incentivize these models to generalize to multiple tasks. Multi-task learning (MTL) for generalist models is a promising direction that offers transfer learning potential, enabling large parameterized models to be trained from simpler, related tasks. In this work, we investigate the combination of MTL with ICL to build models that efficiently learn tasks while being robust to out-of-distribution examples. We propose several effective curriculum learning strategies that allow ICL models to achieve higher data efficiency and more stable convergence. Our experiments reveal that ICL models can effectively learn difficult tasks by training on progressively harder tasks while mixing in prior tasks, denoted as mixed curriculum in this work. Our code and models are available at https://github.com/harmonbhasin/curriculum_learning_icl .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Samira Abnar and Willem Zuidema. 2020. Quantifying attention flow in transformers. arXiv preprint arXiv:2005.00928.
  2. What learning algorithm is in-context learning? investigations with linear models. In The Eleventh International Conference on Learning Representations.
  3. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pages 41–48, New York, NY, USA. Association for Computing Machinery.
  4. Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687.
  5. Language models are few-shot learners.
  6. Center for High Throughput Computing. 2006. Center for high throughput computing.
  7. What does BERT look at? an analysis of BERT’s attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 276–286, Florence, Italy. Association for Computational Linguistics.
  8. Michael Crawshaw. 2020. Multi-task learning with deep neural networks: A survey.
  9. A survey on in-context learning.
  10. A mathematical framework for transformer circuits. Transformer Circuits Thread. Https://transformer-circuits.pub/2021/framework/index.html.
  11. What can transformers learn in-context? a case study of simple function classes. In Advances in Neural Information Processing Systems, volume 35, pages 30583–30598. Curran Associates, Inc.
  12. Automated Curriculum Learning for Neural Networks. In Proceedings of the 34th International Conference on Machine Learning, pages 1311–1320. PMLR. ISSN: 2640-3498.
  13. Transformers as algorithms: Generalization and stability in in-context learning. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 19565–19594. PMLR.
  14. Transformers as algorithms: Generalization and stability in in-context learning. In International Conference on Machine Learning, pages 19565–19594. PMLR.
  15. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Computing Surveys, 55.
  16. Are emergent abilities in large language models just in-context learning?
  17. MetaICL: Learning to learn in context. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2791–2809, Seattle, United States. Association for Computational Linguistics.
  18. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048–11064, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  19. In-context learning and induction heads. Transformer Circuits Thread. Https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html.
  20. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  21. Sebastian Ruder. 2017. An overview of multi-task learning in deep neural networks.
  22. Curriculum Learning: A Survey. International Journal of Computer Vision, 130(6):1526–1565.
  23. Let the Model Decide its Curriculum for Multitask Learning. In Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing, pages 117–125, Hybrid. Association for Computational Linguistics.
  24. Attention is all you need. Advances in neural information processing systems, 30.
  25. Jesse Vig and Yonatan Belinkov. 2019. Analyzing the structure of attention in a transformer language model. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 63–76, Florence, Italy. Association for Computational Linguistics.
  26. Transformers learn in-context by gradient descent. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 35151–35174. PMLR.
  27. A Survey on Curriculum Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1.
  28. Emergent abilities of large language models.
  29. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  30. Larger language models do in-context learning differently.
  31. A survey of transfer learning. Journal of Big Data, 3(1):9.
  32. Huggingface’s transformers: State-of-the-art natural language processing. CoRR, abs/1910.03771.
  33. An explanation of in-context learning as implicit bayesian inference. In International Conference on Learning Representations.
  34. Curriculum Learning for Natural Language Understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6095–6104, Online. Association for Computational Linguistics.
  35. Pretraining data mixtures enable narrow model selection capabilities in transformer models.
  36. Iterative forward tuning boosts in-context learning in language models.
  37. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
  38. Did you read the instructions? rethinking the effectiveness of task definitions in instruction learning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3063–3079, Toronto, Canada. Association for Computational Linguistics.
  39. Investigating the catastrophic forgetting in multimodal large language models.
  40. A Survey of Multi-task Learning in Natural Language Processing: Regarding Task Relatedness and Training Methods. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run custom paper prompts using GPT-5 on this paper.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube