PPT: Pre-trained Prompt Tuning for Few-shot Learning (2109.04332v3)

Published 9 Sep 2021 in cs.CL

Abstract: Prompts for pre-trained LLMs (PLMs) have shown remarkable performance by bridging the gap between pre-training tasks and various downstream tasks. Among these methods, prompt tuning, which freezes PLMs and only tunes soft prompts, provides an efficient and effective solution for adapting large-scale PLMs to downstream tasks. However, prompt tuning is yet to be fully explored. In our pilot experiments, we find that prompt tuning performs comparably with conventional full-model fine-tuning when downstream data are sufficient, whereas it performs much worse under few-shot learning settings, which may hinder the application of prompt tuning in practice. We attribute this low performance to the manner of initializing soft prompts. Therefore, in this work, we propose to pre-train prompts by adding soft prompts into the pre-training stage to obtain a better initialization. We name this Pre-trained Prompt Tuning framework "PPT". To ensure the generalization of PPT, we formulate similar classification tasks into a unified task form and pre-train soft prompts for this unified task. Extensive experiments show that tuning pre-trained prompts for downstream tasks can reach or even outperform full-model fine-tuning under both full-data and few-shot settings. Our approach is effective and efficient for using large-scale PLMs in practice.

PDF Abstract

Analyzing the Potential of Pre-trained Prompt Tuning for Few-shot Learning

The paper, "PPT: Pre-trained Prompt Tuning for Few-shot Learning," presents a novel approach designed to enhance the efficacy of prompt tuning, particularly in the context of few-shot learning scenarios using pre-trained LLMs (PLMs). This work addresses a critical shortcoming in conventional prompt tuning methodologies, wherein initializing soft prompts poses challenges, especially when data scarcity is prevalent.

Key Concepts and Methodology

Prompt tuning (PT) has emerged as an efficient method for adapting PLMs to downstream tasks by employing tunable soft prompts rather than tuning the entire model. Despite its advantages in scenarios with abundant data, PT demonstrates inferior performance under few-shot conditions. The authors attribute this limitation primarily to inadequate initialization of soft prompts.

The paper introduces Pre-trained Prompt Tuning (PPT) as a solution, which involves incorporating soft prompts during the original pre-training stage of PLMs. By establishing a pre-trained foundation, the PPT framework seeks a more robust initialization for soft prompts, consequently enhancing task-specific performances. To guarantee general applicability, a unified task form is developed, classifying tasks into three main types: sentence-pair classification, multiple-choice classification, and single-text classification. Each category is associated with a dedicated pre-training task to pre-train soft prompts effectively.

Empirical Evaluations

The paper reports extensive experimental evaluations across several datasets and PLMs, notably T5-XXL, mT5-XXL, and CPM-2. Results exhibit that by using the proposed PPT approach, the performance of prompt tuning aligns closely, and in certain cases, even surpasses that of full-model tuning. This achievement is consistent under both full-data and few-shot settings.

Quantitatively, the paper highlights that PPT not only enhances accuracy metrics but also shows lower variance in learning outcomes across various tasks. For instance, in sentiment classification and natural language inference tasks, PPT significantly outperformed traditional PT methods. The results support the hypothesis that well-initialized prompts via pre-training serve as a pivotal factor in achieving superior outcomes in few-shot learning scenarios.

Theoretical and Practical Implications

The theoretical impact of this work lies in its potential to redefine how large-scale PLMs can be practically employed across different NLP applications. By pinpointing and addressing the limitations inherent in prompt tuning related to soft prompt initialization, the authors propose a more resilient approach adaptable to varied data conditions.

From a practical standpoint, the PPT framework offers a compelling alternative to full-model tuning, particularly where computational resources are constrained. This method allows efficient storage and use of large-scale PLMs, thus proving invaluable when scaling models to numerous tasks without extensive retraining.

Future Prospects

The PPT framework sets a precedent for future research avenues in two fundamental ways. First, it opens up possibilities for expanding this approach to other domains and task types beyond classification, including generative tasks and multilingual contexts. Second, the methodology can inform enhanced techniques in training PLMs by incorporating self-supervised pre-training stages that better align with downstream applications.

In conclusion, the paper presents a well-founded, empirically-backed approach to prompt tuning that offers meaningful contributions to the field of few-shot learning. By leveraging pre-trained prompts, it not only advances the effectiveness of PT but also provides a scalable, resource-efficient solution that stands resilient across both data-scarce and data-rich scenarios.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Yuxian Gu (21 papers)
Xu Han (270 papers)
Zhiyuan Liu (433 papers)
Minlie Huang (225 papers)

Citations (364)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/cloneofsimo/status/1745651888723542184