Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning (2205.14704v5)

Published 29 May 2022 in cs.CL, cs.AI, cs.IR, and cs.LG

Abstract: Prompt learning approaches have made waves in natural language processing by inducing better few-shot performance while they still follow a parametric-based learning paradigm; the oblivion and rote memorization problems in learning may encounter unstable generalization issues. Specifically, vanilla prompt learning may struggle to utilize atypical instances by rote during fully-supervised training or overfit shallow patterns with low-shot data. To alleviate such limitations, we develop RetroPrompt with the motivation of decoupling knowledge from memorization to help the model strike a balance between generalization and memorization. In contrast with vanilla prompt learning, RetroPrompt constructs an open-book knowledge-store from training instances and implements a retrieval mechanism during the process of input, training and inference, thus equipping the model with the ability to retrieve related contexts from the training corpus as cues for enhancement. Extensive experiments demonstrate that RetroPrompt can obtain better performance in both few-shot and zero-shot settings. Besides, we further illustrate that our proposed RetroPrompt can yield better generalization abilities with new datasets. Detailed analysis of memorization indeed reveals RetroPrompt can reduce the reliance of LLMs on memorization; thus, improving generalization for downstream tasks. Code is available in https://github.com/zjunlp/PromptKG/tree/main/research/RetroPrompt.

Authors (9)

Xiang Chen (343 papers)
Lei Li (1293 papers)
Ningyu Zhang (148 papers)
Xiaozhuan Liang (14 papers)
Shumin Deng (65 papers)
Chuanqi Tan (56 papers)
Fei Huang (409 papers)
Luo Si (73 papers)
Huajun Chen (198 papers)

Citations (51)

View on Semantic Scholar

Summary

Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning

The paper "Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning" addresses the challenges faced by traditional prompt learning in NLP, specifically the issues of rote memorization and unstable generalization. This work introduces a novel framework aimed at separating knowledge from memorization to enhance the generalization capability of LLMs, particularly in few-shot and zero-shot settings.

Framework

The core contribution is a retrieval-augmented framework that leverages an open-book knowledge-store constructed from training instances. The approach incorporates a retrieval mechanism in input processing, training, and inference phases, effectively enabling the model to access additional contextual information when making predictions.

Key Components

Open-book Knowledge-store: This consists of key-value pairs derived from training data, where keys are prompt-based example embeddings, and values are their corresponding labels. The store functions as a knowledge repository, aiding the model in referencing relevant contexts without resorting to memorization.
Neural Demonstrations: By introducing neural demonstrations, this method enhances input sequences with retrieved neural representations rather than relying on discrete demonstrations, which can be limiting due to input length constraints.
$k$ NN-guided Training and Prediction: The framework utilizes $k$ -nearest neighbor (kNN) algorithms to guide training and refine prediction. During training, $k$ NN results help in identifying hard examples, adjusting the loss function accordingly to emphasize these instances. In the prediction phase, the model combines kNN-based probability distributions with the model's output for decision-making.

Experimental Evaluation

The framework was evaluated across multiple NLP tasks including sentiment analysis, natural language inference, and information extraction. The results demonstrated that their method outperformed existing prompt learning approaches and knowledge-enhanced methods. Notably, it achieved significant performance improvements in few-shot and zero-shot settings.

Few-shot Settings: The introduction of retrieval augmentation reduced the dependency on parametric memorization, allowing models to utilize scarce data more effectively.
Zero-shot Settings: The model leveraged unlabeled data efficiently for retrieval without requiring extensive fine-tuning, showcasing robust generalization to unseen tasks.

Furthermore, the model exhibited enhanced stability over traditional methods, which is particularly beneficial in low-resource scenarios.

Implications and Future Directions

The implications of this research are twofold: practically, it offers a more efficient way to utilize available data by reducing the need for rote memorization, and theoretically, it advances understanding of how decoupling knowledge can enhance model generalization.

Future work could explore extending this approach to other areas such as question answering and natural language generation. Additionally, optimizing the retrieval mechanism's efficiency for large datasets and further integrating unsupervised learning methods could enhance the framework's applicability across diverse tasks.

In summary, "Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning" presents a compelling advancement in prompt learning methodologies, offering an innovative solution to improve the generalization abilities of LLMs by systematically leveraging retrieval mechanisms from internal data instances.

PDF Markdown