Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Many-Shot In-Context Learning (2404.11018v3)

Published 17 Apr 2024 in cs.LG, cs.AI, and cs.CL
Many-Shot In-Context Learning

Abstract: LLMs excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks. While promising, many-shot ICL can be bottlenecked by the available amount of human-generated examples. To mitigate this limitation, we explore two new settings: Reinforced and Unsupervised ICL. Reinforced ICL uses model-generated chain-of-thought rationales in place of human examples. Unsupervised ICL removes rationales from the prompt altogether, and prompts the model only with domain-specific questions. We find that both Reinforced and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. Finally, we demonstrate that, unlike few-shot learning, many-shot learning is effective at overriding pretraining biases, can learn high-dimensional functions with numerical inputs, and performs comparably to fine-tuning. We also find that inference cost increases linearly in the many-shot regime, and frontier LLMs benefit from many-shot ICL to varying degrees. Our analysis also reveals the limitations of next-token prediction loss as an indicator of downstream ICL performance.

Investigating Many-Shot In-Context Learning in LLMs

Introduction to In-Context Learning

In-context learning (ICL) equips LLMs to adapt to new tasks using example inputs and outputs termed "shots." Traditionally constrained by token limitations in the model's context window, most research has focused on few-shot ICL. However, expanding context windows in recent LLMs creates opportunities to paper the impacts of many-shot ICL, where hundreds or thousands of shots can be used. This shift promises greater task adaptability and performance improvements but also introduces new challenges such as the dependence on high-quality, large-scale human-generated outputs.

Key Contributions of the Study

The paper offers several significant insights into the scaling of in-context examples:

  • Performance Gains Across Tasks: Extending in-context shots consistently improved LLM performance across various tasks compared to traditional few-shot learning, particularly when leveraging the larger context windows of the Gemini 1.5 Pro model.
  • Reinforced and Unsupervised ICL: Innovations include Reinforced ICL, which employs model-generated rationales instead of human-written ones, and Unsupervised ICL, which uses problem sets without paired solutions, reducing reliance on labor-intensive human-generated content.
  • Overcoming Pre-Training Biases: The paper demonstrates that many-shot ICL can counteract biases from the pre-training phase, particularly when provided with a sufficient number of examples.
  • Exploration of Non-Natural Language Prediction Tasks: The expansion into tasks that involve logical or structured outputs beyond standard text indicates that many-shot ICL can be effective for a broader array of applications than previously possible with few-shot approaches.

Detailed Analysis and Findings

Scaling with Context Length

The research highlights improvements in task performance correlating with the number of shots used in ICL as context lengths increase. This scalability is evident in tasks as diverse as machine translation, summarization, and even complex reasoning, showcasing the utility of many-shot approaches facilitated by modern LLMs with expanded context capabilities.

Reinforced and Unsupervised In-Context Learning

  • Reinforced ICL: By using model-generated solutions and selecting those that achieve correct outcomes, this approach offers a scalable alternative to using expensive human-generated data, showing considerable promise in reasoning tasks.
  • Unsupervised ICL: This proposal removes solutions from the training data altogether, asking the model to infer answers based solely on input problems. While successful in some contexts, it highlights the variability in effectiveness depending on the nature of the task.

Overcoming Pre-Training Biases

Through extensive testing, this paper illustrates that many-shot learning configurations allow LLMs to surpass inherent biases instilled during their initial training phase. This finding is crucial for deploying LLMs in scenarios where neutrality and absence of pre-existing biases are critical.

Further Implications and Future Work

The findings encourage further exploration into optimal configurations of in-context learning, particularly regarding the number of examples and their arrangement within prompts. There is also a need to refine our understanding of why performance may plateau or decrease with an excessive number of examples. Continual advancements in the development of longer-context models will play a pivotal role in harnessing the full potential of many-shot ICL, which could redefine the operational capabilities of LLMs across diverse domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. An in-depth look at gemini’s language abilities. arXiv preprint arXiv:2312.11444, 2023.
  2. Many-shot jailbreaking. Technical report, Anthropic, 2024.
  3. Anthropic. The claude 3 model family: Opus, sonnet, haiku. Technical Report, 2024.
  4. Understanding in-context learning in transformers and llms by learning to learn discrete functions. arXiv preprint arXiv:2310.03016, 2023.
  5. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
  6. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
  7. G. Gemini Team. Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  8. G. Gemini Team. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arxiv:2403.05530, 2024.
  9. PDDL—The Planning Domain Definition Language, 1998.
  10. The flores-101 evaluation benchmark for low-resource and multilingual machine translation. Trans. Assoc. Comput. Linguistics, 10:522–538, 2022.
  11. Reinforced self-training (rest) for language modeling. arXiv preprint arXiv:2308.08998, 2023.
  12. Xl-sum: Large-scale multilingual abstractive summarization for 44 languages. In C. Zong, F. Xia, W. Li, and R. Navigli, editors, Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, volume ACL/IJCNLP 2021 of Findings of ACL, pages 4693–4703. Association for Computational Linguistics, 2021.
  13. M. Helmert. The fast downward planning system. Journal of Artificial Intelligence Research, 26:191–246, July 2006. ISSN 1076-9757. 10.1613/jair.1705. URL http://dx.doi.org/10.1613/jair.1705.
  14. In-context learning creates task vectors. arXiv preprint arXiv:2310.15916, 2023.
  15. Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874, 2021.
  16. V-star: Training verifiers for self-taught reasoners. arXiv preprint arXiv:2402.06457, 2024.
  17. An information-theoretic analysis of in-context learning. arXiv preprint arXiv:2401.15530, 2024.
  18. G. Kamradt. LLMTest_NeedleInAHaystack. https://github.com/gkamradt/LLMTest_NeedleInAHaystack, 2023. Accessed: 2024-04-16.
  19. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  20. Self-generated in-context learning: Leveraging auto-regressive language models as a demonstration generator. CoRR, abs/2206.08082, 2022. 10.48550/ARXIV.2206.08082. URL https://doi.org/10.48550/arXiv.2206.08082.
  21. In-context learning learns label relationships but is not conventional learning. In The Twelfth International Conference on Learning Representations, 2023.
  22. In-context learning with many demonstration examples. CoRR, abs/2302.04931, 2023. 10.48550/ARXIV.2302.04931. URL https://doi.org/10.48550/arXiv.2302.04931.
  23. Let’s verify step by step. CoRR, abs/2305.20050, 2023. 10.48550/ARXIV.2305.20050. URL https://doi.org/10.48550/arXiv.2305.20050.
  24. C.-Y. Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL https://aclanthology.org/W04-1013.
  25. Z. Lin and K. Lee. Dual operating modes of in-context learning. arXiv preprint arXiv:2402.18819, 2024.
  26. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786, 2021.
  27. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In S. Muresan, P. Nakov, and A. Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 8086–8098. Association for Computational Linguistics, 2022. 10.18653/V1/2022.ACL-LONG.556. URL https://doi.org/10.18653/v1/2022.acl-long.556.
  28. Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65(4):782–796, 2014.
  29. Rethinking the role of demonstrations: What makes in-context learning work? In Y. Goldberg, Z. Kozareva, and Y. Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 11048–11064. Association for Computational Linguistics, 2022.
  30. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 1797–1807. Association for Computational Linguistics, 2018.
  31. Lever: Learning to verify language-to-code generation with execution. In International Conference on Machine Learning, pages 26106–26128. PMLR, 2023.
  32. M. A. NLLB Team. No language left behind: Scaling human-centered machine translation. arXiv preprint, 2022.
  33. J. Pan. What in-context learning “learns” in-context: Disentangling task recognition and task learning. PhD thesis, Princeton University, 2023.
  34. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
  35. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  36. Gpqa: A graduate-level google-proof q&a benchmark. arXiv preprint arXiv:2311.12022, 2023.
  37. Chatgpt mt: Competitive for high-(but not low-) resource languages. arXiv preprint arXiv:2309.07423, 2023.
  38. PDDL generators. https://doi.org/10.5281/zenodo.6382173, 2022.
  39. Beyond human data: Scaling self-training for problem-solving with language models. arXiv preprint arXiv:2312.06585, 2023.
  40. Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261, 2022.
  41. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  42. On the planning abilities of large language models-a critical investigation. Advances in Neural Information Processing Systems, 36, 2024.
  43. Large language models are latent variable models: Explaining and finding good demonstrations for in-context learning. Advances in Neural Information Processing Systems, 36, 2024.
  44. Larger language models do in-context learning differently. arXiv preprint arXiv:2303.03846, 2023.
  45. Addressing order sensitivity of in-context demonstration examples in causal language models. arXiv preprint arXiv:2402.15637, 2024.
  46. An explanation of in-context learning as implicit bayesian inference. arXiv preprint arXiv:2111.02080, 2021.
  47. Effective long-context scaling of foundation models. CoRR, abs/2309.16039, 2023. 10.48550/ARXIV.2309.16039. URL https://doi.org/10.48550/arXiv.2309.16039.
  48. Ground-truth labels matter: A deeper look into input-label demonstrations. In Y. Goldberg, Z. Kozareva, and Y. Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 2422–2437. Association for Computational Linguistics, 2022.
  49. Scaling relationship on learning mathematical reasoning with large language models. arXiv preprint arXiv:2308.01825, 2023.
  50. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International conference on machine learning, pages 11328–11339. PMLR, 2020.
  51. Efficient attention via control variates. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=G-uNfHKrj46.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (15)
  1. Rishabh Agarwal (47 papers)
  2. Avi Singh (21 papers)
  3. Lei M. Zhang (9 papers)
  4. Bernd Bohnet (21 papers)
  5. Stephanie Chan (23 papers)
  6. Ankesh Anand (13 papers)
  7. Zaheer Abbas (11 papers)
  8. Azade Nova (13 papers)
  9. John D. Co-Reyes (16 papers)
  10. Eric Chu (17 papers)
  11. Feryal Behbahani (18 papers)
  12. Aleksandra Faust (60 papers)
  13. Hugo Larochelle (87 papers)
  14. Luis Rosias (1 paper)
  15. Biao Zhang (76 papers)
Citations (63)
Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews

  1. Many-Shot In-Context Learning (61 points, 1 comment)