Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rectifying Demonstration Shortcut in In-Context Learning (2403.09488v3)

Published 14 Mar 2024 in cs.CL and cs.AI

Abstract: LLMs are able to solve various tasks with only a few demonstrations utilizing their in-context learning (ICL) abilities. However, LLMs often rely on their pre-trained semantic priors of demonstrations rather than on the input-label relationships to proceed with ICL prediction. In this work, we term this phenomenon as the 'Demonstration Shortcut'. While previous works have primarily focused on improving ICL prediction results for predefined tasks, we aim to rectify the Demonstration Shortcut, thereby enabling the LLM to effectively learn new input-label relationships from demonstrations. To achieve this, we introduce In-Context Calibration, a demonstration-aware calibration method. We evaluate the effectiveness of the proposed method in two settings: (1) the Original ICL Task using the standard label space and (2) the Task Learning setting, where the label space is replaced with semantically unrelated tokens. In both settings, In-Context Calibration demonstrates substantial improvements, with results generalized across three LLM families (OPT, GPT, and Llama2) under various configurations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 54–63, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
  2. Gpt-neox-20b: An open-source autoregressive language model. arXiv preprint arXiv:2204.06745.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  4. The pascal recognising textual entailment challenge. In Machine learning challenges workshop, pages 177–190. Springer.
  5. Hate Speech Dataset from a White Supremacy Forum. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pages 11–20, Brussels, Belgium. Association for Computational Linguistics.
  6. The commitmentbank: Investigating projection in naturally occurring discourse. In proceedings of Sinn und Bedeutung, volume 23, pages 107–124.
  7. A survey for in-context learning. arXiv preprint arXiv:2301.00234.
  8. Mitigating label biases for in-context learning. arXiv preprint arXiv:2305.19148.
  9. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027.
  10. Surface form competition: Why the highest probability answer isn’t always right. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7038–7051, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  11. Toward semantics-based answer pinpointing. In Proceedings of the First International Conference on Human Language Technology Research.
  12. Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168–177.
  13. Generative calibration for in-context learning. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 2312–2333.
  14. In-context learning in large language models learns label relationships but is not conventional learning. arXiv preprint arXiv:2307.12375.
  15. The winograd schema challenge. In Thirteenth international conference on the principles of knowledge representation and reasoning.
  16. Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65.
  17. A SICK cure for the evaluation of compositional distributional semantic models. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pages 216–223, Reykjavik, Iceland. European Language Resources Association (ELRA).
  18. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048–11064, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  19. Semeval-2016 task 6: Detecting stance in tweets. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pages 31–41.
  20. Ethos: an online hate speech detection dataset. arXiv preprint arXiv:2006.08328.
  21. Adversarial nli: A new benchmark for natural language understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.
  22. What in-context learning “learns” in-context: Disentangling task recognition and task learning. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8298–8319, Toronto, Canada. Association for Computational Linguistics.
  23. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pages 271–es.
  24. L PaNgB. 2005. Exploitingclassrelationshipsforsentimentcate gorizationwithrespectratingsales. IN: ProceedingsofACL r05.
  25. Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–7.
  26. Social bias frames: Reasoning about social and power implications of language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5477–5490, Online. Association for Computational Linguistics.
  27. Emily Sheng and David Uthus. 2020. Investigating societal biases in a poetry composition system. In Proceedings of the Second Workshop on Gender Bias in Natural Language Processing, pages 93–106, Barcelona, Spain (Online). Association for Computational Linguistics.
  28. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
  29. Large language models can be lazy learners: Analyze shortcuts in in-context learning. In Annual Meeting of the Association for Computational Linguistics.
  30. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  31. Semeval-2018 task 3: Irony detection in english tweets. In Proceedings of The 12th International Workshop on Semantic Evaluation, pages 39–50.
  32. Ben Wang. 2021. Mesh-Transformer-JAX: Model-Parallel Implementation of Transformer Language Model with JAX. https://github.com/kingoflolz/mesh-transformer-jax.
  33. Larger language models do in-context learning differently, 2023. URL https://arxiv. org/abs/2303.03846.
  34. Huggingface’s transformers: State-of-the-art natural language processing.
  35. An explanation of in-context learning as implicit bayesian inference. In International Conference on Learning Representations.
  36. Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval). In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 75–86.
  37. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
  38. Character-level convolutional networks for text classification. In NIPS.
  39. What and how does in-context learning learn? bayesian model averaging, parameterization, and generalization. arXiv preprint arXiv:2305.19420.
  40. Calibrate before use: Improving few-shot performance of language models. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 12697–12706. PMLR.
  41. Batch calibration: Rethinking calibration for in-context learning and prompt engineering. arXiv preprint arXiv:2309.17249.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Joonwon Jang (5 papers)
  2. Sanghwan Jang (7 papers)
  3. Wonbin Kweon (16 papers)
  4. Minjin Jeon (2 papers)
  5. Hwanjo Yu (57 papers)
Citations (1)