Papers
Topics
Authors
Recent
Search
2000 character limit reached

Learning How to Ask: Querying LMs with Mixtures of Soft Prompts

Published 14 Apr 2021 in cs.CL and cs.LG | (2104.06599v1)

Abstract: Natural-language prompts have recently been used to coax pretrained LLMs into performing other AI tasks, using a fill-in-the-blank paradigm (Petroni et al., 2019) or a few-shot extrapolation paradigm (Brown et al., 2020). For example, LLMs retain factual knowledge from their training corpora that can be extracted by asking them to "fill in the blank" in a sentential prompt. However, where does this prompt come from? We explore the idea of learning prompts by gradient descent -- either fine-tuning prompts taken from previous work, or starting from random initialization. Our prompts consist of "soft words," i.e., continuous vectors that are not necessarily word type embeddings from the LLM. Furthermore, for each task, we optimize a mixture of prompts, learning which prompts are most effective and how to ensemble them. Across multiple English LMs and tasks, our approach hugely outperforms previous methods, showing that the implicit factual knowledge in LLMs was previously underestimated. Moreover, this knowledge is cheap to elicit: random initialization is nearly as good as informed initialization.

Citations (499)

Summary

  • The paper introduces a novel gradient descent method to optimize soft prompts, significantly improving factual extraction with up to 50-point precision gains.
  • It employs a mixture modeling approach that enlists multiple soft prompts with data-dependent weights to enhance adaptability across tasks.
  • The study demonstrates that randomly initialized soft prompts often outperform manually crafted ones, revealing underutilized knowledge in language models.

Learning How to Ask: Querying LMs with Mixtures of Soft Prompts

The paper "Learning How to Ask: Querying LMs with Mixtures of Soft Prompts" by Guanghui Qin and Jason Eisner presents a novel approach to elicit information from pretrained LMs using soft prompts. This method involves learning and optimizing prompts in a continuous vector space, significantly enhancing the extraction of factual and commonsense knowledge.

Overview

Pretrained LLMs like BERT, RoBERTa, and BART have shown promise in storing factual knowledge. However, the efficacy of querying that knowledge depends heavily on the form of prompts used. Traditional approaches either manually craft or use templates derived from data mining and paraphrasing. This work shifts the paradigm by employing gradient descent to optimize "soft prompts," which are sequences of continuous vectors rather than actual word embeddings.

Methodology

The paper describes the approach of training these soft prompts via gradient descent, either fine-tuning existing templates or initializing them randomly. The method considers creating a mixture of prompts optimized across various LLMs and tasks. A mixture modeling approach is utilized, where the system learns to ensemble different prompts effectively, potentially using data-dependent mixture weights.

Key components include:

  • Soft Prompts: Utilizing continuous vectors for tokens in prompts, unlocking a more expressive and tunable prompt space.
  • Deeply Perturbed Prompts: An enhancement allowing perturbations across all layers in the model, not just at the input level.
  • Mixture Modeling: The approach leverages a combination of multiple soft prompts, optimizing mixture weights based on their performance.
  • Data-Dependent Weights: Adjusting prompts based on specific input data, although this did not significantly impact results.

Results

The study evaluates its approach using datasets like T-REx, Google-RE, and ConceptNet, demonstrating substantial improvements over baseline methods. Tuning randomly initialized soft prompts often outperformed manually crafted ones, suggesting an underestimation of LMs' implicit knowledge when using conventional prompts. Notably, random initialization demonstrated strong results, with precision at one seeing up to 50-point improvements in some cases.

Implications

The implications of this research span both practical and theoretical domains. Practically, it suggests more efficient ways to extract knowledge from LMs, reducing the reliance on expert-crafted prompts. Theoretically, it indicates that much of the LMs' knowledge remains untapped due to suboptimal prompting forms. The ability to elicit this knowledge accurately can contribute to downstream tasks requiring factual understanding.

Future Directions

Potential future directions include exploring other forms of soft prompt initialization and tuning, such as leveraging different neural architectures or broader application contexts. Additionally, extending this approach to few-shot learning scenarios presents an intriguing opportunity, potentially enhancing machine learning systems' adaptability.

Overall, this paper offers a significant contribution to understanding how to optimize the interaction with LLMs, offering a more nuanced view of their capabilities and limitations.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.