Learning How to Ask: Querying LMs with Mixtures of Soft Prompts (2104.06599v1)

Published 14 Apr 2021 in cs.CL and cs.LG

Abstract: Natural-language prompts have recently been used to coax pretrained LLMs into performing other AI tasks, using a fill-in-the-blank paradigm (Petroni et al., 2019) or a few-shot extrapolation paradigm (Brown et al., 2020). For example, LLMs retain factual knowledge from their training corpora that can be extracted by asking them to "fill in the blank" in a sentential prompt. However, where does this prompt come from? We explore the idea of learning prompts by gradient descent -- either fine-tuning prompts taken from previous work, or starting from random initialization. Our prompts consist of "soft words," i.e., continuous vectors that are not necessarily word type embeddings from the LLM. Furthermore, for each task, we optimize a mixture of prompts, learning which prompts are most effective and how to ensemble them. Across multiple English LMs and tasks, our approach hugely outperforms previous methods, showing that the implicit factual knowledge in LLMs was previously underestimated. Moreover, this knowledge is cheap to elicit: random initialization is nearly as good as informed initialization.

PDF Abstract

Learning How to Ask: Querying LMs with Mixtures of Soft Prompts

The paper "Learning How to Ask: Querying LMs with Mixtures of Soft Prompts" by Guanghui Qin and Jason Eisner presents a novel approach to elicit information from pretrained LLMs (LMs) using soft prompts. This method involves learning and optimizing prompts in a continuous vector space, significantly enhancing the extraction of factual and commonsense knowledge.

Overview

Pretrained LLMs like BERT, RoBERTa, and BART have shown promise in storing factual knowledge. However, the efficacy of querying that knowledge depends heavily on the form of prompts used. Traditional approaches either manually craft or use templates derived from data mining and paraphrasing. This work shifts the paradigm by employing gradient descent to optimize "soft prompts," which are sequences of continuous vectors rather than actual word embeddings.

Methodology

The paper describes the approach of training these soft prompts via gradient descent, either fine-tuning existing templates or initializing them randomly. The method considers creating a mixture of prompts optimized across various LLMs and tasks. A mixture modeling approach is utilized, where the system learns to ensemble different prompts effectively, potentially using data-dependent mixture weights.

Key components include:

Soft Prompts: Utilizing continuous vectors for tokens in prompts, unlocking a more expressive and tunable prompt space.
Deeply Perturbed Prompts: An enhancement allowing perturbations across all layers in the model, not just at the input level.
Mixture Modeling: The approach leverages a combination of multiple soft prompts, optimizing mixture weights based on their performance.
Data-Dependent Weights: Adjusting prompts based on specific input data, although this did not significantly impact results.

Results

The paper evaluates its approach using datasets like T-REx, Google-RE, and ConceptNet, demonstrating substantial improvements over baseline methods. Tuning randomly initialized soft prompts often outperformed manually crafted ones, suggesting an underestimation of LMs' implicit knowledge when using conventional prompts. Notably, random initialization demonstrated strong results, with precision at one seeing up to 50-point improvements in some cases.

Implications

The implications of this research span both practical and theoretical domains. Practically, it suggests more efficient ways to extract knowledge from LMs, reducing the reliance on expert-crafted prompts. Theoretically, it indicates that much of the LMs' knowledge remains untapped due to suboptimal prompting forms. The ability to elicit this knowledge accurately can contribute to downstream tasks requiring factual understanding.

Future Directions

Potential future directions include exploring other forms of soft prompt initialization and tuning, such as leveraging different neural architectures or broader application contexts. Additionally, extending this approach to few-shot learning scenarios presents an intriguing opportunity, potentially enhancing machine learning systems' adaptability.

Overall, this paper offers a significant contribution to understanding how to optimize the interaction with LLMs, offering a more nuanced view of their capabilities and limitations.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Guanghui Qin (16 papers)
Jason Eisner (56 papers)

Citations (499)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/HlibIvanov/status/1757363799156298198