Incorporating Knowledge into Prompt Verbalizer for Text Classification
The paper presents an innovative approach to tuning pre-trained LLMs (PLMs) for text classification, specifically targeting the deficiencies in the verbalizer component of prompt-tuning techniques. The essence of this paper lies in expanding the label word space by incorporating external knowledge bases (KBs), thereby optimizing the verbalizer and facilitating knowledgeable prompt-tuning (KPT).
Prompt-tuning has shown remarkable potential in classification tasks, particularly under data-scarce conditions. However, the construction of the verbalizer—responsible for mapping between the label space and the label word space—has been a bottleneck due to its design's heavy reliance on either manual crafting or data-intensive algorithms. These methods often introduce bias and result in high variance, limiting their robustness, especially in zero-shot and few-shot scenarios.
The proposed KPT addresses these challenges by expanding the verbalizer using external KBs, thereby enlarging the label word space far beyond the constraints of handcrafted or gradient-searched approaches. The methodology is structured into three pivotal stages: construction, refinement, and utilization.
- Construction: The expansion leverages KBs to cover a broader range of label words, capturing different granularities and perspectives. For example, in a topic classification task, KPT extends single-word mappings to sets of words (e.g., "science" might encompass "physics," "chemistry," and "biology"), thereby creating a richer vocabulary available for classification.
- Refinement: Given the expansion potentially introduces noise, four refining strategies are employed:
- Frequency Refinement filters low-frequency words using contextualized priors to maintain high-quality predictions.
- Relevance Refinement assesses and retains label words that are significantly more relevant to their intended classes.
- Contextualized Calibration adjusts for the inherent bias some label words possess due to frequency effects.
- Learnable Refinement further tunes these probabilities using labeled data in few-shot settings to adjust the averaging weights for label word contributions.
- Utilization: The refined label words are then averaged or weighted averaged when determining the final classification label, with the weighted approach emphasizing probabilities learned during the training phase.
Empirically, KPT demonstrates significant performance boosts across various datasets and classification tasks, reducing error rates by 16% and 18% on zero-shot tasks and showing marked improvements in few-shot scenarios. The approach ensures a consistent reduction in variance, promoting more stable predictions than traditional prompt-tuning methods. The robustness of the method stems from the incorporation of external knowledge, which also shows resilience towards the limitations presented by scarce data availability in few/zero-shot learning settings.
In summary, this work signifies a meaningful stride toward harnessing external knowledge to mitigate the limitations intrinsic to LLM prompting practices. Future investigations could explore incorporating KPT approaches into other NLP tasks beyond text classification, potentially expanding to text generation or a wider array of NLP applications. Furthermore, as KBs evolve, these mappings can be refined and enriched, further enhancing model robustness and performance.