Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Addressing Order Sensitivity of In-Context Demonstration Examples in Causal Language Models (2402.15637v2)

Published 23 Feb 2024 in cs.CL

Abstract: In-context learning has become a popular paradigm in natural language processing. However, its performance can be significantly influenced by the order of in-context demonstration examples. In this paper, we found that causal LLMs (CausaLLMs) are more sensitive to this order compared to prefix LLMs (PrefixLMs). We attribute this phenomenon to the auto-regressive attention masks within CausaLLMs, which restrict each token from accessing information from subsequent tokens. This results in different receptive fields for samples at different positions, thereby leading to representation disparities across positions. To tackle this challenge, we introduce an unsupervised fine-tuning method, termed the Information-Augmented and Consistency-Enhanced approach. This approach utilizes contrastive learning to align representations of in-context examples across different positions and introduces a consistency loss to ensure similar representations for inputs with different permutations. This enhances the model's predictive consistency across permutations. Experimental results on five benchmarks suggest that our proposed method can reduce the sensitivity of CausaLLMs to the order of in-context examples and exhibit robust generalizability, particularly when demonstrations are sourced from a candidate pool different from that used in the training phase, or when the number of in-context examples differs from what is used during training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. How do in-context examples affect compositional generalization? ArXiv preprint.
  2. Language models are few-shot learners. ArXiv, abs/2005.14165.
  3. Ting-Yun Chang and Robin Jia. 2022. Data curation alone can stabilize in-context learning. In Annual Meeting of the Association for Computational Linguistics.
  4. The pascal recognising textual entailment challenge. In Machine Learning Challenges Workshop.
  5. Chain-of-thought hub: A continuous effort to measure large language models’ reasoning performance. ArXiv, abs/2305.17306.
  6. Making pre-trained language models better few-shot learners. In Annual Meeting of the Association for Computational Linguistics.
  7. Measuring massive multitask language understanding. ArXiv, abs/2009.03300.
  8. Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv: Learning.
  9. Lora: Low-rank adaptation of large language models. ArXiv, abs/2106.09685.
  10. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR, abs/1412.6980.
  11. Xiaonan Li and Xipeng Qiu. 2023. Finding support examples for in-context learning. In Conference on Empirical Methods in Natural Language Processing.
  12. What makes good in-context examples for gpt-3? In Workshop on Knowledge Extraction and Integration for Deep Learning Architectures; Deep Learning Inside Out.
  13. Lost in the middle: How language models use long contexts. ArXiv, abs/2307.03172.
  14. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692.
  15. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proc. of ACL.
  16. Rethinking the role of demonstrations: What makes in-context learning work? In Proc. of EMNLP.
  17. Tai Nguyen and Eric Wong. 2023. In-context example selection with influences. ArXiv, abs/2302.11042.
  18. Language models are unsupervised multitask learners.
  19. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67.
  20. Alexander Scarlatos and Andrew S. Lan. 2023. Reticl: Sequential retrieval of in-context examples with reinforcement learning. ArXiv, abs/2305.14502.
  21. Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods in Natural Language Processing.
  22. Mpnet: Masked and permuted pre-training for language understanding. ArXiv, abs/2004.09297.
  23. Ul2: Unifying language learning paradigms. In International Conference on Learning Representations.
  24. Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
  25. Attention is all you need. In Neural Information Processing Systems.
  26. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In the Proceedings of ICLR.
  27. Chain of thought prompting elicits reasoning in large language models. ArXiv, abs/2201.11903.
  28. Self-adaptive in-context learning: An information compression perspective for in-context example selection and ordering. In Annual Meeting of the Association for Computational Linguistics.
  29. Opt: Open pre-trained transformer language models. ArXiv, abs/2205.01068.
  30. Active example selection for in-context learning. ArXiv, abs/2211.04486.
  31. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning.
Citations (3)

Summary

We haven't generated a summary for this paper yet.