Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Grimoire is All You Need for Enhancing Large Language Models (2401.03385v2)

Published 7 Jan 2024 in cs.CL

Abstract: In-context Learning (ICL) is one of the key methods for enhancing the performance of LLMs on specific tasks by providing a set of few-shot examples. However, the ICL capability of different types of models shows significant variation due to factors such as model architecture, volume of learning data, and the size of parameters. Generally, the larger the model's parameter size and the more extensive the learning data, the stronger its ICL capability. In this paper, we propose a method SLEICL that involves learning from examples using strong LLMs and then summarizing and transferring these learned skills to weak LLMs for inference and application. This ensures the stability and effectiveness of ICL. Compared to directly enabling weak LLMs to learn from prompt examples, SLEICL reduces the difficulty of ICL for these models. Our experiments, conducted on up to eight datasets with five LLMs, demonstrate that weak LLMs achieve consistent improvement over their own zero-shot or few-shot capabilities using the SLEICL method. Some weak LLMs even surpass the performance of GPT4-1106-preview (zero-shot) with the aid of SLEICL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020.
  2. Data distributional properties drive emergent in-context learning in transformers. In Advances in Neural Information Processing Systems, volume 35, pages 18878–18891, 2022.
  3. The pascal recognising textual entailment challenge. In Proceedings of the First International Conference on Machine Learning Challenges: Evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment, MLCW’05, page 177–190, Berlin, Heidelberg, 2005. Springer-Verlag.
  4. Hate Speech Dataset from a White Supremacy Forum. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pages 11–20, Brussels, Belgium, October 2018. Association for Computational Linguistics.
  5. A survey on in-context learning. arXiv preprint arXiv:2301.00234, 2023.
  6. What can transformers learn in-context? a case study of simple function classes. In Advances in Neural Information Processing Systems, volume 35, pages 30583–30598, 2022.
  7. Toward semantics-based answer pinpointing. In Proceedings of the First International Conference on Human Language Technology Research, 2001.
  8. Finding support examples for in-context learning. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6219–6235, 2023.
  9. Learning question classifiers. In COLING 2002: The 19th International Conference on Computational Linguistics, 2002.
  10. Hongfu Liu and Ye Wang. Towards informative few-shot prompt with maximum information gain for in-context learning. arXiv preprint arXiv:2310.08923, 2023.
  11. What makes good in-context examples for GPT-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 100–114, Dublin, Ireland and Online, May 2022. Association for Computational Linguistics.
  12. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086–8098, Dublin, Ireland, May 2022. Association for Computational Linguistics.
  13. In-context learning for text classification with many labels. In Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP, pages 173–184, Singapore, December 2023. Association for Computational Linguistics.
  14. Rethinking the role of demonstrations: What makes in-context learning work? In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 11048–11064. Association for Computational Linguistics, 2022.
  15. Ethos: an online hate speech detection dataset, 2020.
  16. Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pages 271–278, Barcelona, Spain, July 2004.
  17. In-context unlearning: Language models as few shot unlearners. arXiv preprint arXiv:2310.07579, 2023.
  18. In-context learning with iterative demonstration selection. arXiv preprint arXiv:2310.09881, 2023.
  19. SQuAD: 100,000+ questions for machine comprehension of text. In Jian Su, Kevin Duh, and Xavier Carreras, editors, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas, November 2016. Association for Computational Linguistics.
  20. Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633, 2022.
  21. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics.
  22. Mpnet: Masked and permuted pre-training for language understanding. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc.
  23. Selective annotation makes language models better few-shot learners. arXiv preprint arXiv:2209.01975, 2022.
  24. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  25. Larger language models do in-context learning differently. arXiv preprint arXiv:2303.03846, 2023.
  26. Self-adaptive in-context learning: An information compression perspective for in-context example selection and ordering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1423–1436, Toronto, Canada, July 2023. Association for Computational Linguistics.
  27. An explanation of in-context learning as implicit bayesian inference. arXiv preprint arXiv:2111.02080, 2022.
  28. Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305, 2023.
  29. Ground-truth labels matter: A deeper look into input-label demonstrations. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 2422–2437. Association for Computational Linguistics, 2022.
  30. Character-level convolutional networks for text classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, page 649–657, Cambridge, MA, USA, 2015. MIT Press.
  31. Active example selection for in-context learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9134–9148, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics.
  32. Automatic chain of thought prompting in large language models. In The Eleventh International Conference on Learning Representations, 2023.
  33. Calibrate before use: Improving few-shot performance of language models. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 12697–12706. PMLR, 18–24 Jul 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Ding Chen (22 papers)
  2. Shichao Song (19 papers)
  3. Qingchen Yu (7 papers)
  4. Zhiyu Li (69 papers)
  5. Wenjin Wang (56 papers)
  6. Feiyu Xiong (53 papers)
  7. Bo Tang (111 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com