Multitask Prompted Training Enables Zero-Shot Task Generalization: An Overview
The paper is an empirical investigation into the viability of inducing zero-shot task generalization through explicit multitask learning as opposed to relying solely on implicit multitask learning during the pretraining of LLMs. Specifically, this work scrutinizes whether training an LLM across a diverse set of tasks, formatted using natural language prompts, can enhance the model’s zero-shot performance on unseen tasks.
Abstract
The authors present a framework for converting diverse NLP tasks into human-readable prompts and demonstrate the efficacy of their approach by fine-tuning a T5-based model on multiple datasets. The results show that the fine-tuned model, termed T0, achieves competitive zero-shot performance, often surpassing models significantly larger in size.
Introduction
The success of LLMs in zero-shot generalization has primarily been attributed to multitask learning during pretraining where the model is exposed to a range of implicit tasks present in its vast training corpus. This paper examines whether explicitly fine-tuning LLMs using a mixture of multiple NLP tasks formulated through prompts can enhance zero-shot generalization without resorting to massive model sizes.
Methodology
The authors propose creating a universal prompt format supported by a templating language that allows easy conversion of dataset instances into prompted examples. They collect a broad set of prompts from public contributors to build a diverse training set, emphasizing varied wording to ensure robustness to different prompt formulations.
The main variant, T0, is built by augmenting a pretrained encoder-decoder model (T5+LM, a variant of T5) through multitask fine-tuning using the newly generated prompted datasets. They also examine other variants such as T0+ and T0++, trained on additional datasets to further explore the impacts of increased dataset diversity on model performance.
Evaluation
The authors benchmarked the zero-shot performance of T0 on multiple held-out tasks, including natural language inference (NLI), coreference resolution, word sense disambiguation, and sentence completion. They extend the evaluation to novel tasks from the BIG-bench benchmark. T0’s performance is compared against several models including various GPT-3 sizes.
Results
- Zero-Shot Generalization:
- T0 consistently outperforms the baseline T5+LM without multitask training across several held-out tasks.
- Notably, T0 often matches or surpasses GPT-3 models up to 175B parameters in size despite being significantly smaller (11B parameters).
- Prompt Robustness:
- Experiments demonstrate that increasing the number and diversity of prompts leads to improved median performance and decreased variability on unseen tasks.
- Training on a more extensive collection of tasks generally boosts performance, though does not consistently reduce performance variability.
- Comparison with Alternative Models:
- T0 and its variants achieve competitive results against FLAN, another multitask prompted model. In some cases, T0++ achieves higher accuracy despite being an order of magnitude smaller in parameter count.
- T0 shows strong performance on BIG-bench tasks, often surpassing baseline models and indicating effective generalization to novel NLP concepts.
Implications
Theoretical Implications
The findings reinforce the hypothesis that explicit multitask learning can be a potent mechanism to achieve zero-shot generalization. This experimentation delineates the benefits of task and prompt diversity during LLM fine-tuning, suggesting that multitask prompted training can yield robust and adaptable models without the need for extremely large parameter counts.
Practical Implications
The practical implications are significant. T0's ability to generalize well to unseen tasks at a fraction of GPT-3's size implies substantial resource savings in both deployment and inference stages. It democratizes the access to high-performing zero-shot learning models by reducing the computational and financial barrier to training and deploying large LLMs, thus extending their benefits to a broader range of applications and institutions.
Future Work
Future research may involve:
- Exploring the upper limits of generalization achieved by further increasing prompt and task diversity.
- Fine-tuning specific aspects of the prompting framework to enhance semantic understanding and task adaptability.
- Investigating the balance between model size, diversity of training data, and performance outcomes to optimize resource utilization.
This paper lays essential groundwork for advancing zero-shot task generalization through multitask learning, offering a pragmatic and scalable alternative to model pretraining at massive scales. Its findings are poised to influence future AI developments, propelling more efficient and versatile models in natural language processing.