Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TEGEE: Task dEfinition Guided Expert Ensembling for Generalizable and Few-shot Learning (2403.04233v3)

Published 7 Mar 2024 in cs.CL and cs.AI

Abstract: LLMs exhibit the ability to perform in-context learning (ICL), where they acquire new tasks directly from examples provided in demonstrations. This process is thought to operate through an implicit task selection mechanism that involves extracting and processing task definitions from these demonstrations. However, critical questions remain: Which is more essential -- task extraction or definition? And how can these capabilities be further improved? To address these questions, we propose \textbf{TEGEE} (Task Definition Guided Expert Ensembling), a method that explicitly extracts task definitions and generates responses based on specific tasks. Our framework employs a dual 3B model approach, with each model assigned a distinct role: one focuses on task definition extraction, while the other handles learning from demonstrations. This modular approach supports the hypothesis that extracting task definitions is more vital than processing the task itself. Empirical evaluations show that TEGEE performs comparably to the larger LLaMA2-13B model. By leveraging a modular design, our approach extends traditional ICL from few-shot to many-shot learning, supporting an unlimited number of demonstrations and enhancing continual learning capabilities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Falcon-40B: an open large language model with state-of-the-art performance.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  3. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  4. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335.
  5. Making pre-trained language models better few-shot learners. ArXiv, abs/2012.15723.
  6. Unnatural instructions: Tuning language models with (almost) no human labor. arXiv preprint arXiv:2212.09689.
  7. Parameter-efficient transfer learning for nlp. arXiv preprint arXiv:1902.00751.
  8. Lora: Low-rank adaptation of large language models.
  9. Lorahub: Efficient cross-task generalization via dynamic lora composition. arXiv preprint arXiv:2307.13269.
  10. Editing models with task arithmetic. arXiv preprint arXiv:2212.04089.
  11. Neural network module decomposition and recomposition. arXiv preprint arXiv:2112.13208.
  12. Openassistant conversations–democratizing large language model alignment. arXiv preprint arXiv:2304.07327.
  13. Diverse demonstrations improve in-context compositional generalization.
  14. Self-alignment with instruction backtranslation. arXiv preprint arXiv:2308.06259.
  15. Cross-task generalization via natural language crowdsourcing instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3470–3487. Association for Computational Linguistics.
  16. What in-context learning "learns" in-context: Disentangling task recognition and task learning. Annual Meeting of the Association for Computational Linguistics. Accepted to Findings of ACL 2023.
  17. Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
  18. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  19. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks.
  20. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
  21. Timo Schick and Hinrich Schütze. 2020a. Exploiting cloze questions for few shot text classification and natural language inference. arXiv preprint arXiv:2001.07676.
  22. Timo Schick and Hinrich Schütze. 2020b. It’s not just size that matters: Small language models are also few-shot learners. Computing Research Repository, arXiv:2009.07118.
  23. Improving and simplifying pattern exploiting training. ArXiv, abs/2103.11955.
  24. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  25. Llama: Open and efficient foundation language models.
  26. Large language models are implicitly topic models: Explaining and finding good demonstrations for in-context learning. arXiv preprint arXiv:2301.11916.
  27. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
  28. Super-naturalinstructions:generalization via declarative instructions on 1600+ tasks. In EMNLP.
  29. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time.
  30. An explanation of in-context learning as implicit bayesian inference. arXiv preprint arXiv:2111.02080.
  31. Crossfit: A few-shot learning challenge for cross-task generalization in nlp. arXiv preprint arXiv:2104.08835.
  32. Contintin: Continual learning from task instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3062–3072. Association for Computational Linguistics.
  33. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414.
  34. Composing parameter-efficient modules with arithmetic operations. arXiv preprint arXiv:2306.14870.
  35. Revisiting few-sample BERT fine-tuning. In International Conference on Learning Representations.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com