Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prompt-Augmented Linear Probing: Scaling beyond the Limit of Few-shot In-Context Learners (2212.10873v3)

Published 21 Dec 2022 in cs.CL and cs.LG

Abstract: Through in-context learning (ICL), large-scale LLMs are effective few-shot learners without additional model fine-tuning. However, the ICL performance does not scale well with the number of available training samples as it is limited by the inherent input length constraint of the underlying LLM. Meanwhile, many studies have revealed that LLMs are also powerful feature extractors, allowing them to be utilized in a black-box manner and enabling the linear probing paradigm, where lightweight discriminators are trained on top of the pre-extracted input representations. This paper proposes prompt-augmented linear probing (PALP), a hybrid of linear probing and ICL, which leverages the best of both worlds. PALP inherits the scalability of linear probing and the capability of enforcing LLMs to derive more meaningful representations via tailoring input into a more conceivable form. Throughout in-depth investigations on various datasets, we verified that PALP significantly enhances the input representations closing the gap between ICL in the data-hungry scenario and fine-tuning in the data-abundant scenario with little training overhead, potentially making PALP a strong alternative in a black-box scenario.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning. In Zong, C.; Xia, F.; Li, W.; and Navigli, R., eds., Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, ACL.
  2. PADA: Example-based Prompt Learning for on-the-fly Adaptation to Unseen Domains. Transactions of the Association for Computational Linguistics.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
  4. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  5. Black-box prompt learning for pre-trained language models. arXiv preprint arXiv:2201.08531.
  6. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. Journal of Machine Learning Research, 23(120): 1–39.
  7. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
  8. Scaling laws for autoregressive generative modeling. arXiv preprint arXiv:2010.14701.
  9. Training Compute-Optimal Large Language Models. arXiv preprint arXiv:2203.15556.
  10. Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP.
  11. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning.
  12. Scaling Laws for Neural Language Models. CoRR.
  13. What Makes Good In-Context Examples for GPT-3? In Proceedings of Deep Learning Inside Out: The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, DeeLIO@ACL.
  14. Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv.
  15. GPT Understands, Too. arXiv preprint arXiv:2103.10385.
  16. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, ACL.
  17. Noisy Channel Language Model Prompting for Few-Shot Text Classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, ACL.
  18. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).
  19. Language Models as Knowledge Bases? In Inui, K.; Jiang, J.; Ng, V.; and Wan, X., eds., Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, EMNLP.
  20. Synchromesh: Reliable Code Generation from Pre-trained Language Models. In The Tenth International Conference on Learning Representations, ICLR.
  21. Exploring low-dimensional intrinsic task subspace via prompt tuning. arXiv preprint arXiv:2110.07867.
  22. Language models are unsupervised multitask learners. OpenAI blog, 1(8): 9.
  23. Impact of pretraining term frequencies on few-shot reasoning. arXiv preprint arXiv:2202.07206.
  24. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, 1–7.
  25. Learning To Retrieve Prompts for In-Context Learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics, NAACL.
  26. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv preprint arXiv:2211.05100.
  27. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. In Merlo, P.; Tiedemann, J.; and Tsarfaty, R., eds., Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, EACL.
  28. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, NAACL.
  29. Unsupervised Commonsense Question Answering with Self-Talk. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 4615–4629.
  30. Black-Box Tuning for Language-Model-as-a-Service. In Chaudhuri, K.; Jegelka, S.; Song, L.; Szepesvári, C.; Niu, G.; and Sabato, S., eds., International Conference on Machine Learning, ICML.
  31. Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation. arXiv preprint arXiv:2107.02137.
  32. Attention is all you need. In Advances in neural information processing systems, 5998–6008.
  33. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In 7th International Conference on Learning Representations, ICLR.
  34. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax. Accessed: 2021-05.
  35. An Explanation of In-context Learning as Implicit Bayesian Inference. In The Tenth International Conference on Learning Representations, ICLR.
  36. Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).
  37. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
  38. Calibrate Before Use: Improving Few-shot Performance of Language Models. In Meila, M.; and Zhang, T., eds., Proceedings of the 38th International Conference on Machine Learning, ICML.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Hyunsoo Cho (28 papers)
  2. Hyuhng Joon Kim (11 papers)
  3. Junyeob Kim (7 papers)
  4. Sang-Woo Lee (34 papers)
  5. Sang-goo Lee (40 papers)
  6. Kang Min Yoo (40 papers)
  7. Taeuk Kim (38 papers)
Citations (21)

Summary

We haven't generated a summary for this paper yet.