Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Language Models can Exploit Cross-Task In-context Learning for Data-Scarce Novel Tasks (2405.10548v3)

Published 17 May 2024 in cs.CL

Abstract: LLMs have transformed NLP with their remarkable In-context Learning (ICL) capabilities. Automated assistants based on LLMs are gaining popularity; however, adapting them to novel tasks is still challenging. While colossal models excel in zero-shot performance, their computational demands limit widespread use, and smaller LLMs struggle without context. This paper investigates whether LLMs can generalize from labeled examples of predefined tasks to novel tasks. Drawing inspiration from biological neurons and the mechanistic interpretation of the Transformer architecture, we explore the potential for information sharing across tasks. We design a cross-task prompting setup with three LLMs and show that LLMs achieve significant performance improvements despite no examples from the target task in the context. Cross-task prompting leads to a remarkable performance boost of 107% for LLaMA-2 7B, 18.6% for LLaMA-2 13B, and 3.2% for GPT 3.5 on average over zero-shot prompting, and performs comparable to standard in-context learning. The effectiveness of generating pseudo-labels for in-task examples is demonstrated, and our analyses reveal a strong correlation between the effect of cross-task examples and model activation similarities in source and target input tokens. This paper offers a first-of-its-kind exploration of LLMs' ability to solve novel tasks based on contextual signals from different task examples.

Analysis of Cross-Task Prompting Capabilities in LLMs

The paper "LLMs can Learn In-context from Cross-task Prompts," investigates the capability of LLMs to generalize across tasks when exposed to labeled examples from different task domains. This work explores the concept of Cross-task Prompting within the domain of In-Context Learning (ICL), where LLMs are exemplified by their ability to infer tasks without explicit training updates. The authors focus on evaluating whether LLMs can leverage examples from a task library to perform significantly better on tasks for which they have no specific training data, offering an alternative approach to standard ICL practices.

Motivation and Challenges

The paper arises from two primary challenges: the high computational cost associated with colossal models in zero-shot regimes and the performance limitations of smaller models without in-context prompts. The explored solution leverages similarities with biological neural pathways, which often exhibit transfer learning across different limbs or tasks. By drawing parallels with the Transformer architecture's mechanistic interpretation, there is potential for leveraging learned pathways across tasks, providing context for the unprecedented adaptability observed in LLMs.

Methodology

The authors delineate a Cross-task Prompting setup using three LLMs: LLaMA-2 7B, LLaMA-2 13B, and GPT 3.5. Within this framework, experiments are conducted across various task pairs, with one task providing the source examples and another constituting the target task. Critical to their methodology is the selection of semantically similar examples from source datasets to create effective prompt contexts. The rigorous design involves a series of controlled configurations: semantic similarity selection, random instance selection, and label randomization.

Results

  • Performance Boosts: Across all models, Cross-task Prompting delivered observable performance improvements compared to zero-shot regimes. Average improvements were noted as 107% for LLaMA-2 7B, 18.6% for LLaMA-2 13B, and 3.2% for GPT 3.5. The capability to achieve near-equivalent performance to standard ICL models using unrelated task examples is a significant finding.
  • Dependence on Source Tasks: The results indicate differing efficacy based on source-target task pairings. Certain tasks like ARC-Easy consistently improved target task performance, indicating their better alignment and domain coverage. Conversely, some tasks like Conll2003-POS offered minimal improvements, suggesting domain specificity's role in information transfer.
  • Robustness of Prompting Techniques: When increasing the number of examples from source tasks, Cross-task Prompting did not necessarily yield better results. This contrasts with typical ICL setups where more examples usually lead to better outcomes, highlighting a key limitation of cross-domain learning.
  • Pseudo-label Generation: Incorporating Cross-task Prompting for generating pseudo-labels showcased marked improvements over zero-shot predictions, often rivaling the performance achieved by gold-standard labels. This highlights the potential of this approach in settings where labeled data is scarce.

Implications and Future Directions

This investigation into Cross-task Prompting demonstrates the potential for LLMs to become more versatile and accessible across various applications, reducing dependency on extensive task-specific data. The method exemplifies a critical step toward achieving training-free task generalization in AI, advancing the efficiency of LLMs in diverse application areas.

Looking forward, the development of more sophisticated alignment algorithms could further improve Cross-task Prompting effectiveness. Discovering shared neural pathways within Transformer models may unlock broader and more efficient intra-model communication, creating opportunities for improved generalizable AI systems. This research supports ongoing endeavors in demonstrating the vast potential of integrating semantic and contextual elements across seemingly disparate tasks. Recommendations for future work should focus on enhancing LLM interpretability and identifying potential limitations inherent to task dissimilarities absent from current datasets.

Conclusion

The paper effectively addresses a key limitation within the LLM landscape, proposing Cross-task Prompting as a viable route for improving LLM adaptability to novel tasks. While offering significant advancements in efficiency and applicability, this research also lays foundational insights for enhancing task generalization strategies within AI, poised to impact future model development and deployment.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. The sciqa scientific question answering benchmark for scholarly knowledge. Scientific Reports, 13(1):7240.
  2. Language models are few-shot learners. CoRR, abs/2005.14165.
  3. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2924–2936, Minneapolis, Minnesota. Association for Computational Linguistics.
  4. Think you have solved question answering? try arc, the ai2 reasoning challenge. ArXiv, abs/1803.05457.
  5. Towards automated circuit discovery for mechanistic interpretability. arXiv preprint arXiv:2304.14997.
  6. A mathematical framework for transformer circuits. Transformer Circuits Thread, 1.
  7. Demystifying prompts in language models via perplexity estimation.
  8. Generate, Annotate, and Learn: NLP with Synthetic Text. Transactions of the Association for Computational Linguistics, 10:826–842.
  9. In-context learning creates task vectors. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9318–9333, Singapore. Association for Computational Linguistics.
  10. RACE: Large-scale ReAding comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 785–794, Copenhagen, Denmark. Association for Computational Linguistics.
  11. What makes good in-context examples for GPT-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 100–114, Dublin, Ireland and Online. Association for Computational Linguistics.
  12. Z-ICL: Zero-shot in-context learning with pseudo-demonstrations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2304–2317, Toronto, Canada. Association for Computational Linguistics.
  13. Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65.
  14. Efficient and robust multi-task learning in the brain with modular latent primitives. arXiv preprint arXiv:2105.14108.
  15. OpenAI. 2023. Gpt-4 technical report.
  16. Saurabh Pahune and Manoj Chandrasekharan. 2023. Several categories of large language models (llms): A short survey. arXiv preprint arXiv:2307.10188.
  17. Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering. In Proceedings of the Conference on Health, Inference, and Learning, volume 174 of Proceedings of Machine Learning Research, pages 248–260. PMLR.
  18. Neural pathways conveying novisual information to the visual cortex. Neural plasticity, 2013.
  19. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  20. Lance A. Ramshaw and Mitchell P. Marcus. 1995. Text chunking using transformation-based learning. ArXiv, cmp-lg/9505040.
  21. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  22. Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2655–2671, Seattle, United States. Association for Computational Linguistics.
  23. Kathy Ruddy and Richard Carson. 2013. Neural pathways mediating cross education of motor function. Frontiers in Human Neuroscience, 7.
  24. Socialiqa: Commonsense reasoning about social interactions. CoRR, abs/1904.09728.
  25. Natural language understanding with the quora question pairs dataset. CoRR, abs/1907.01041.
  26. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
  27. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota. Association for Computational Linguistics.
  28. Multilingual LLMs are better cross-lingual in-context learners with alignment. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6292–6307, Toronto, Canada. Association for Computational Linguistics.
  29. Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pages 142–147.
  30. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  31. Strata: Self-training with task augmentation for better few-shot learning. CoRR, abs/2109.06270.
  32. Interpretability in the wild: a circuit for indirect object identification in gpt-2 small. arXiv preprint arXiv:2211.00593.
  33. Text embeddings by weakly-supervised contrastive pre-training. ArXiv, abs/2212.03533.
  34. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  35. Symbol tuning improves in-context learning in language models.
  36. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics.
  37. Character-level convolutional networks for text classification. In NIPS.
  38. Task compass: Scaling multi-task pre-training with task prefix. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5671–5685, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  39. Meta-cot: Generalizable chain-of-thought prompting in mixed-task scenarios with large language models.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Anwoy Chatterjee (3 papers)
  2. Eshaan Tanwar (4 papers)
  3. Subhabrata Dutta (24 papers)
  4. Tanmoy Chakraborty (224 papers)
Citations (4)
Youtube Logo Streamline Icon: https://streamlinehq.com