Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners (2210.02969v4)

Published 6 Oct 2022 in cs.CL

Abstract: Meta-training, which fine-tunes the LLM (LM) on various downstream tasks by maximizing the likelihood of the target label given the task instruction and input instance, has improved the zero-shot task generalization performance. However, meta-trained LMs still struggle to generalize to challenging tasks containing novel labels unseen during meta-training. In this paper, we propose Flipped Learning, an alternative method of meta-training which trains the LM to generate the task instruction given the input instance and label. During inference, the LM trained with Flipped Learning, referred to as Flipped, selects the label option that is most likely to generate the task instruction. On 14 tasks of the BIG-bench benchmark, the 11B-sized Flipped outperforms zero-shot T0-11B and even a 16 times larger 3-shot GPT-3 (175B) on average by 8.4% and 9.7% points, respectively. Flipped gives particularly large improvements on tasks with unseen labels, outperforming T0-11B by up to +20% average F1 score. This indicates that the strong task generalization of Flipped comes from improved generalization to novel labels. We release our code at https://github.com/seonghyeonye/Flipped-Learning.

PDF Abstract

Overview of "Guess the Instruction! Flipped Learning Makes LLMs Stronger Zero-Shot Learners"

The paper "Guess the Instruction! Flipped Learning Makes LLMs Stronger Zero-Shot Learners" introduces a novel approach called Flipped Learning, designed to enhance zero-shot generalization in LLMs (LMs). LLMs have shown capabilities in zero-shot tasks by interpreting task instructions concatenated with input data. However, these models encounter challenges when dealing with novel labels, unseen during meta-training, and thus cannot generalize effectively to these labels.

Flipped Learning Methodology

Flipped Learning changes the conventional meta-training approach by training the model to predict task instructions given the input instance and its respective label, contrary to generating the correct label when prompted. The key difference is that the label, rather than the instruction, becomes the input's conditional element during training. This paradigm, when applied to inference, allows the model, termed as "Flipped," to select the label most likely to yield the task instruction. Such a design promises strengthened generalization to novel task labels.

Experimental Evaluation and Results

The paper validates the efficacy of Flipped Learning through evaluation on the BIG-bench benchmark and several common English NLP tasks. Notably, the 11-billion parameter Flipped model surpasses existing zero-shot baselines, such as T0-11B and the much larger GPT-3 175B, by 8.4% and 9.7% on average, respectively. Moreover, Flipped demonstrates up to a 20% improvement in F1 scores on datasets with previously unseen labels. Such improved performance across various undertakings underscores Flipped’s prowess, particularly in scenarios where novel labels are inherent.

Theoretical and Practical Implications

Theoretically, this research suggests a promising direction for further exploring the conditional probabilities in model training that can efficiently utilize the underlying structure of task setups. Practically, this method significantly boosts the applicability of models in real-world scenarios where the tasks encountered vary considerably in labels and formats not previously trained upon.

Future Prospect

In the context of emerging AI technologies, this work encourages a re-evaluation of traditional meta-training frameworks, prompting developers to tailor more adaptive and label-independent systems. Future research can explore scalable implementations of Flipped Learning or integrate its principles with reinforcement learning paradigms to enhance task adaptability further.

In conclusion, Flipped Learning represents a significant contribution to the fine-tuning of instruction-oriented LLMs, providing robust zero-shot learners well-suited for dynamic application environments. The evidence from this paper positions it as a key methodology in the expanding corpus of techniques aimed at enhancing AI's adaptability and generalization capabilities.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Seonghyeon Ye (25 papers)
Doyoung Kim (19 papers)
Joel Jang (30 papers)
Joongbo Shin (14 papers)
Minjoon Seo (82 papers)

Citations (23)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - seonghyeonye/Flipped-Learning: [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners (111 stars)