Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PromptRank: Unsupervised Keyphrase Extraction Using Prompt (2305.04490v2)

Published 8 May 2023 in cs.IR

Abstract: The keyphrase extraction task refers to the automatic selection of phrases from a given document to summarize its core content. State-of-the-art (SOTA) performance has recently been achieved by embedding-based algorithms, which rank candidates according to how similar their embeddings are to document embeddings. However, such solutions either struggle with the document and candidate length discrepancies or fail to fully utilize the pre-trained LLM (PLM) without further fine-tuning. To this end, in this paper, we propose a simple yet effective unsupervised approach, PromptRank, based on the PLM with an encoder-decoder architecture. Specifically, PromptRank feeds the document into the encoder and calculates the probability of generating the candidate with a designed prompt by the decoder. We extensively evaluate the proposed PromptRank on six widely used benchmarks. PromptRank outperforms the SOTA approach MDERank, improving the F1 score relatively by 34.18%, 24.87%, and 17.57% for 5, 10, and 15 returned results, respectively. This demonstrates the great potential of using prompt for unsupervised keyphrase extraction. We release our code at https://github.com/HLT-NLP/PromptRank.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Aobo Kong (11 papers)
  2. Shiwan Zhao (48 papers)
  3. Hao Chen (1007 papers)
  4. Qicheng Li (5 papers)
  5. Yong Qin (36 papers)
  6. Ruiqi Sun (9 papers)
  7. Xiaoyan Bai (4 papers)
Citations (14)

Summary

  • The paper introduces PromptRank, an unsupervised method for keyphrase extraction that uses prompts with pre-trained language models to rank candidates based on their generation probability, without requiring fine-tuning.
  • Evaluated on six benchmark datasets, PromptRank significantly outperformed the state-of-the-art method MDERank, showing relative F1 improvements of 34.18%, 24.87%, and 17.57% for the top 5, 10, and 15 keyphrases.
  • PromptRank is versatile across various pre-trained language models beyond T5, and its performance can be enhanced by factors like a position penalty for candidates within the document.

The paper "PromptRank: Unsupervised Keyphrase Extraction Using Prompt" introduces a novel method for keyphrase extraction from documents using an unsupervised approach with a pre-trained encoder-decoder architecture. The authors identify challenges with existing embedding-based keyphrase extraction methods, such as discrepancies between document and candidate length and the need for fine-tuning pre-trained LLMs (PLMs). To address these issues, PromptRank employs a prompt-based mechanism in combination with PLMs to rank keyphrase candidates based on their generation probability within a structured template.

Key Contributions:

  1. Prompt-Based Keyphrase Extraction: The approach utilizes prompts to expand candidate phrases into templates for comparison with document content. An encoder-decoder architecture (like T5) facilitates mapping these to a shared latent space, calculating a generation probability for ranking candidates without requiring additional training or fine-tuning.
  2. Evaluation and Results: The method was tested on six benchmark datasets, demonstrating robust performance. PromptRank significantly outperformed the state-of-the-art method, MDERank, with relative improvements in the F1F1 scores of 34.18%, 24.87%, and 17.57% for the top 5, 10, and 15 keyphrases extracted, respectively, across varying text lengths.
  3. Factors Influencing Performance: The paper further explores components affecting effectiveness, including candidate position, template length, and content of prompts. A position penalty is introduced to enhance performance by adjusting candidate scores based on their locations within a document.
  4. Versatility Across PLMs: PromptRank's architecture is adaptable to various PLMs beyond T5, including BART, showcasing its wide applicability to different pre-trained models in NLP tasks.
  5. Experiments and Analysis: The paper includes comprehensive experiments, including ablation studies that highlighted the contribution of positional information and prompt design on efficacy. The optimal hyperparameters and configurations were meticulously evaluated to maximize the generalization and robustness of PromptRank across diverse datasets.

In offering an unsupervised, efficient method for keyphrase extraction that leverages the potential of PLMs through prompt-based techniques, this work presents substantial advancements over traditional methodologies. This approach not only enhances the extraction of keyphrases in both short and long documents but does so while remaining adaptable with minimal configuration changes when new, more powerful PLMs become available.

Github Logo Streamline Icon: https://streamlinehq.com