A Preliminary Empirical Study on Prompt-based Unsupervised Keyphrase Extraction (2405.16571v1)
Abstract: Pre-trained LLMs can perform natural language processing downstream tasks by conditioning on human-designed prompts. However, a prompt-based approach often requires "prompt engineering" to design different prompts, primarily hand-crafted through laborious trial and error, requiring human intervention and expertise. It is a challenging problem when constructing a prompt-based keyphrase extraction method. Therefore, we investigate and study the effectiveness of different prompts on the keyphrase extraction task to verify the impact of the cherry-picked prompts on the performance of extracting keyphrases. Extensive experimental results on six benchmark keyphrase extraction datasets and different pre-trained LLMs demonstrate that (1) designing complex prompts may not necessarily be more effective than designing simple prompts; (2) individual keyword changes in the designed prompts can affect the overall performance; (3) designing complex prompts achieve better performance than designing simple prompts when facing long documents.
- Semeval 2017 task 10: Scienceie - extracting keyphrases and relations from scientific publications. In SemEval@ACL, pages 546–555. Association for Computational Linguistics.
- Simple unsupervised keyphrase extraction using sentence embeddings. In CoNLL, pages 221–229. Association for Computational Linguistics.
- Florian Boudin. 2018. Unsupervised keyphrase extraction with multipartite graphs. In NAACL-HLT (2), pages 667–672. Association for Computational Linguistics.
- Topicrank: Graph-based topic ranking for keyphrase extraction. In IJCNLP, pages 543–551. Asian Federation of Natural Language Processing / ACL.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- Yake! collection-independent automatic keyword extractor. In ECIR, volume 10772 of Lecture Notes in Computer Science, pages 806–810. Springer.
- Scaling instruction-finetuned language models. CoRR, abs/2210.11416.
- Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT (1), pages 4171–4186. Association for Computational Linguistics.
- Kazi Saidul Hasan and Vincent Ng. 2014. Automatic keyphrase extraction: A survey of the state of the art. In ACL (1), pages 1262–1273. The Association for Computer Linguistics.
- Anette Hulth. 2003. Improved automatic keyword extraction given more linguistic knowledge. In EMNLP.
- Karen Spärck Jones. 2004. A statistical interpretation of term specificity and its application in retrieval. J. Documentation, 60(5):493–502.
- Semeval-2010 task 5 : Automatic keyphrase extraction from scientific articles. In SemEval@ACL, pages 21–26. The Association for Computer Linguistics.
- Promptrank: Unsupervised keyphrase extraction using prompt. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 9788–9801. Association for Computational Linguistics.
- M. Krapivin and M. Marchese. 2009. Large dataset for keyphrase extraction.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., 55(9):195:1–195:35.
- Roberta: A robustly optimized bert pretraining approach. CoRR, abs/1907.11692.
- Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In EMNLP, pages 404–411. ACL.
- Thuy Dung Nguyen and Min-Yen Kan. 2007. Keyphrase extraction in scientific publications. In ICADL, volume 4822 of Lecture Notes in Computer Science, pages 317–326. Springer.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67.
- Keygames: A game theoretic approach to automatic keyphrase extraction. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2037–2048.
- Hyperbolic relevance matching for neural keyphrase extraction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 5710–5720. Association for Computational Linguistics.
- A preliminary exploration of extractive multi-document summarization in hyperbolic space. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, October 17-21, 2022, pages 4505–4509. ACM.
- Utilizing BERT intermediate layers for unsupervised keyphrase extraction. In 5th International Conference on Natural Language and Speech Processing, ICNLSP 2022, Trento, Italy, December 16-17, 2022, pages 277–281. Association for Computational Linguistics.
- Hisum: Hyperbolic interaction model for extractive multi-document summarization. In Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, pages 1427–1436. ACM.
- A survey on recent advances in keyphrase extraction from pre-trained language models. In Findings of the Association for Computational Linguistics: EACL 2023, Dubrovnik, Croatia, May 2-6, 2023, pages 2108–2119. Association for Computational Linguistics.
- Large language models as zero-shot keyphrase extractors: A preliminary empirical study. CoRR, abs/2312.15156.
- Unsupervised keyphrase extraction by learning neural keyphrase set function. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2482–2494. Association for Computational Linguistics.
- Is chatgpt A good keyphrase generator? A preliminary study. CoRR, abs/2303.13001.
- Importance Estimation from Multiple Perspectives for Keyphrase Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2726–2736. Association for Computational Linguistics.
- Improving embedding-based unsupervised keyphrase extraction by incorporating structural information. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1041–1048. Association for Computational Linguistics.
- HyperRank: Hyperbolic ranking model for unsupervised keyphrase extraction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16070–16080, Singapore. Association for Computational Linguistics.
- Learning to extract from multiple perspectives for neural keyphrase extraction. Comput. Speech Lang., 81:101502.
- Mitigating over-generation for unsupervised keyphrase extraction with heterogeneous centrality detection. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16349–16359, Singapore. Association for Computational Linguistics.
- Counting-stars: A multi-evidence, position-aware, and scalable benchmark for evaluating long-context large language models.
- Capturing global informativeness in open domain keyphrase extraction. In CCF International Conference on Natural Language Processing and Chinese Computing, pages 275–287. Springer.
- Sifrank: A new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access, 8:10896–10906.
- Xiaojun Wan and Jianguo Xiao. 2008. Single document keyphrase extraction using neighborhood knowledge. In AAAI, pages 855–860. AAAI Press.
- Fast and constrained absent keyphrase generation by prompt-based learning. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pages 11495–11503. AAAI Press.
- MDERank: A masked document embedding rank approach for unsupervised keyphrase extraction. In Findings of the Association for Computational Linguistics: ACL 2022, pages 396–409, Dublin, Ireland. Association for Computational Linguistics.