Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

In-Context Learning Learns Label Relationships but Is Not Conventional Learning (2307.12375v4)

Published 23 Jul 2023 in cs.CL, cs.AI, and cs.LG
In-Context Learning Learns Label Relationships but Is Not Conventional Learning

Abstract: The predictions of LLMs on downstream tasks often improve significantly when including examples of the input--label relationship in the context. However, there is currently no consensus about how this in-context learning (ICL) ability of LLMs works. For example, while Xie et al. (2021) liken ICL to a general-purpose learning algorithm, Min et al. (2022) argue ICL does not even learn label relationships from in-context examples. In this paper, we provide novel insights into how ICL leverages label information, revealing both capabilities and limitations. To ensure we obtain a comprehensive picture of ICL behavior, we study probabilistic aspects of ICL predictions and thoroughly examine the dynamics of ICL as more examples are provided. Our experiments show that ICL predictions almost always depend on in-context labels and that ICL can learn truly novel tasks in-context. However, we also find that ICL struggles to fully overcome prediction preferences acquired from pre-training data and, further, that ICL does not consider all in-context information equally.

The paper "In-Context Learning Learns Label Relationships but Is Not Conventional Learning" explores understanding the inner workings of in-context learning (ICL) in LLMs. Here’s a comprehensive overview:

The authors aim to unpack the mechanisms behind ICL, particularly how LLMs utilize examples provided in the input to make predictions on downstream tasks. Despite the significant improvements seen in these tasks when contextual examples are added, there remains a lack of consensus on precisely how this process functions.

Key Contributions and Findings:

  1. ICL as a Learning Mechanism:
    • The paper examines contrasting viewpoints from previous research, where some have argued that ICL acts similarly to a general-purpose learning algorithm, while others believe it does not truly learn from label relationships.
    • Through their research, the authors present novel insights suggesting that while ICL does leverage label information, it operates differently from conventional learning methods.
  2. Probabilistic Analysis:
    • The paper conducts a detailed analysis of the probabilistic nature of ICL predictions. This involves understanding how predictions adapt with the inclusion of more in-context examples.
    • It is revealed that ICL predictions heavily rely on the labels included in the context, indicating that these labels significantly guide prediction outcomes.
  3. Capabilities and Limitations:
    • The research shows that ICL can indeed learn and adapt to new tasks using the information from the context. This demonstrates a form of learning that captures novel task structures dynamically.
    • However, there are noted limitations where ICL struggles to completely override the inherent prediction biases that originate from pre-training. Essentially, ICL does not uniformly utilize all available in-context information, leading to uneven consideration of contextual elements.

The paper contributes to advancing the understanding of how ICL functions within LLMs. By identifying both the capabilities and inherent constraints of in-context learning, the paper provides a foundation for future work aimed at refining and improving how LLMs can be taught to leverage contextual information more effectively. This is crucial for developing models that can generalize better and adapt more fluidly to new tasks and data structures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. In-context examples selection for machine translation. In ACL, 2023. URL https://arxiv.org/abs/2212.02437.
  2. What learning algorithm is in-context learning? investigations with linear models. In ICLR, 2023. URL https://arxiv.org/abs/2211.15661.
  3. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv:2204.05862, 2022a. URL https://arxiv.org/abs/2204.05862.
  4. Constitutional ai: Harmlessness from ai feedback. arXiv:2212.08073, 2022b. URL https://arxiv.org/abs/2212.08073.
  5. Language models are few-shot learners. NeurIPS, 2020. URL https://arxiv.org/abs/2005.14165.
  6. Data distributional properties drive emergent in-context learning in transformers. NeurIPS, 2022a. URL https://arxiv.org/abs/2205.05055.
  7. Transformers generalize differently from information stored in context vs in weights. arXiv:2210.05675, 2022b. URL https://arxiv.org/abs/2210.05675.
  8. Careful data curation stabilizes in-context learning. In EMNLP, 2022. URL https://arxiv.org/abs/2212.10378.
  9. On the relation between sensitivity and accuracy in in-context learning. arXiv:2209.07661, 2022. URL https://arxiv.org/abs/2209.07661.
  10. Palm: Scaling language modeling with pathways. arXiv:2204.02311, 2022. URL https://arxiv.org/abs/2204.02311.
  11. Deep reinforcement learning from human preferences. NeurIPS, 2017. URL https://arxiv.org/abs/1706.03741.
  12. The pascal recognising textual entailment challenge. In Machine learning challenges workshop, 2005. URL https://link.springer.com/chapter/10.1007/11736790_9.
  13. Distinguishing rule and exemplar-based generalization in learning systems. In ICML, 2022. URL https://arxiv.org/abs/2110.04328.
  14. Hate Speech Dataset from a White Supremacy Forum. In ACL Workshop on Abusive Language Online (ALW2), 2018. URL https://www.aclweb.org/anthology/W18-5102.
  15. Automatically constructing a corpus of sentential paraphrases. In Third International Workshop on Paraphrasing (IWP2005), 2005. URL https://aclanthology.org/I05-5002/.
  16. On the marginal likelihood and cross-validation. Biometrika, 2020. URL https://arxiv.org/abs/1905.08737.
  17. Neural processes. arXiv:1807.01622, 2018. URL https://arxiv.org/abs/1807.01622.
  18. Demystifying prompts in language models via perplexity estimation. arXiv:2212.04037, 2022. URL https://arxiv.org/abs/2212.04037.
  19. Meta-learning probabilistic inference for prediction. In ICLR, 2019. URL https://arxiv.org/abs/1805.09921.
  20. A theory of emergent in-context learning as implicit structure induction. arXiv:2303.07971, 2023. URL https://arxiv.org/abs/2303.07971.
  21. In-context learning of large language models explained as kernel regression. arXiv:2305.12766, 2023. URL https://arxiv.org/abs/2305.12766.
  22. Training compute-optimal large language models. In NeurIPS, 2022. URL https://arxiv.org/abs/2203.15556.
  23. Ferenc Huszár. Implicit bayesian inference in large language models, 2023. URL https://www.inference.vc/implicit-bayesian-inference-in-sequence-models/. [Online; accessed 10-July-2023].
  24. Hui Jiang. A latent space theory for emergent abilities in large language models. arXiv:2304.09960, 2023. URL https://arxiv.org/abs/2304.09960.
  25. Language models (mostly) know what they know. arXiv:2207.05221, 2022. URL https://arxiv.org/abs/2207.05221.
  26. General-purpose in-context learning by meta-learning transformers. arXiv:2212.04458, 2022. URL https://arxiv.org/abs/2212.04458.
  27. Self-attention between datapoints: Going beyond individual input-output pairs in deep learning. In NeurIPS, 2021. URL https://arxiv.org/abs/2106.02584.
  28. The winograd schema challenge. In Thirteenth international conference on the principles of knowledge representation and reasoning, 2012. URL https://cdn.aaai.org/ocs/4492/4492-21843-1-PB.pdf.
  29. Finding supporting examples for in-context learning. arXiv:2302.13539, 2023. URL https://arxiv.org/abs/2302.13539.
  30. Holistic evaluation of language models. arXiv:2211.09110, 2022. URL https://arxiv.org/abs/2211.09110.
  31. Teaching models to express their uncertainty in words. TMLR, 2023. URL https://arxiv.org/abs/2205.14334.
  32. What makes good in-context examples for gpt-3333? In ACL, 2022. URL https://arxiv.org/abs/2101.06804.
  33. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 2023. URL https://arxiv.org/abs/2107.13586.
  34. Generating wikipedia by summarizing long sequences. In ICLR, 2018. URL https://arxiv.org/abs/1801.10198.
  35. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In ACL, 2022. URL https://arxiv.org/abs/2104.08786.
  36. Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 2014. URL https://arxiv.org/abs/1307.5336.
  37. Effective transfer learning for identifying similar questions: Matching user questions to covid-19 faqs. arXiv:2008.13546, 2020. URL https://arxiv.org/abs/2008.13546.
  38. Metaicl: Learning to learn in context. In NAACL, 2022a. URL https://arxiv.org/abs/2110.15943.
  39. Rethinking the role of demonstrations: What makes in-context learning work? In ACL, 2022b. URL https://arxiv.org/abs/2202.12837.
  40. Transformers can do bayesian inference. In ICLR, 2022. URL https://arxiv.org/abs/2112.10510.
  41. Kevin P Murphy. Probabilistic machine learning: an introduction. MIT press, 2022. URL https://probml.github.io/pml-book/book1.html.
  42. Training language models to follow instructions with human feedback. NeurIPS, 2022. URL https://arxiv.org/abs/2203.02155.
  43. What in-context learning ”learns” in-context: Disentangling task recognition and task learning. In ACL, 2023. URL https://arxiv.org/abs/2305.09731.
  44. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019. URL https://pytorch.org/.
  45. Improving language understanding with unsupervised learning. Technical report, OpenAI, 2018. URL https://openai.com/research/language-unsupervised.
  46. Language models are unsupervised multitask learners. OpenAI blog, 2019. URL https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.
  47. Impact of pretraining term frequencies on few-shot reasoning. In EMNLP, 2022. URL https://arxiv.org/abs/2202.07206.
  48. Measuring inductive biases of in-context learning with underspecified demonstrations. In ACL, 2023. URL https://arxiv.org/abs/2305.13299.
  49. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP, 2013. URL https://aclanthology.org/D13-1170/.
  50. Efstathios Stamatatos. A survey of modern authorship attribution methods. American Society for information Science and Technology, 2009. URL https://onlinelibrary.wiley.com/doi/10.1002/asi.21001.
  51. Technology Innovation Institute TII. Falcon llm, 2023. URL https://falconllm.tii.ae/. [Online; accessed 10-July-2023].
  52. Llama: Open and efficient foundation language models. arXiv:2302.13971, 2023a. URL https://arxiv.org/abs/2302.13971.
  53. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288, 2023b. URL https://arxiv.org/abs/2307.09288.
  54. Transformers learn in-context by gradient descent. In ICML, 2023. URL https://arxiv.org/abs/2212.07677.
  55. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In ICLR, 2019. URL https://arxiv.org/abs/1804.07461.
  56. Baselines and bigrams: Simple, good sentiment and topic classification. In ACL, 2012. URL https://dl.acm.org/doi/10.5555/2390665.2390688.
  57. Larger language models do in-context learning differently. arXiv:2303.03846, 2023. URL https://arxiv.org/abs/2303.03846.
  58. The learnability of in-context learning. arXiv:2303.07895, 2023. URL https://arxiv.org/abs/2303.07895.
  59. A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1989. URL https://ieeexplore.ieee.org/document/6795228.
  60. Huggingface’s transformers: State-of-the-art natural language processing. In EMNLP: System Demonstrations, 2020. URL https://arxiv.org/abs/1910.03771.
  61. Reasoning or reciting? exploring the capabilities and limitations of language models through counterfactual tasks. arXiv:2307.02477, 2023. URL https://arxiv.org/abs/2307.02477.
  62. An explanation of in-context learning as implicit bayesian inference. In ICLR, 2022. URL https://arxiv.org/abs/2111.02080.
  63. Ground-truth labels matter: A deeper look into input-label demonstrations. In EMNLP, 2022. URL https://arxiv.org/abs/2205.12685.
  64. Opt: Open pre-trained transformer language models. arXiv:2205.01068, 2022a. URL https://arxiv.org/abs/2205.01068.
  65. Character-level convolutional networks for text classification. In NeurIPS, 2015. URL https://arxiv.org/abs/1509.01626.
  66. Active example selection for in-context learning. In EMNLP, 2022b. URL https://arxiv.org/abs/2211.04486.
  67. What and how does in-context learning learn? bayesian model averaging, parameterization, and generalization. arXiv:2305.19420, 2023. URL https://arxiv.org/abs/2305.19420.
  68. Calibrate before use: Improving few-shot performance of language models. In ICML, 2021. URL https://arxiv.org/abs/2102.09690.
  69. Fine-tuning language models from human preferences. arXiv:1909.08593, 2019. URL https://arxiv.org/abs/1909.08593.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jannik Kossen (14 papers)
  2. Yarin Gal (170 papers)
  3. Tom Rainforth (62 papers)
Citations (21)
X Twitter Logo Streamline Icon: https://streamlinehq.com