Continued Pretraining for Better Zero- and Few-Shot Promptability (2210.10258v2)
Abstract: Recently introduced LLM prompting methods can achieve high accuracy in zero- and few-shot settings while requiring few to no learned task-specific parameters. Nevertheless, these methods still often trail behind full model finetuning. In this work, we investigate if a dedicated continued pretraining stage could improve "promptability", i.e., zero-shot performance with natural language prompts or few-shot performance with prompt tuning. We reveal settings where existing continued pretraining methods lack promptability. We also identify current methodological gaps, which we fill with thorough large-scale experiments. We demonstrate that a simple recipe, continued pretraining that incorporates a trainable prompt during multi-task learning, leads to improved promptability in both zero- and few-shot settings compared to existing methods, up to 31% relative. On the other hand, we find that continued pretraining using MAML-style meta-learning, a method that directly optimizes few-shot promptability, yields subpar performance. We validate our findings with two prompt tuning methods, and, based on our results, we provide concrete recommendations to optimize promptability for different use cases.
- Zhaofeng Wu (21 papers)
- Robert L. Logan IV (13 papers)
- Pete Walsh (9 papers)
- Akshita Bhagia (12 papers)
- Dirk Groeneveld (19 papers)
- Sameer Singh (96 papers)
- Iz Beltagy (39 papers)