Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SimCKP: Simple Contrastive Learning of Keyphrase Representations (2310.08221v1)

Published 12 Oct 2023 in cs.CL, cs.AI, and cs.LG

Abstract: Keyphrase generation (KG) aims to generate a set of summarizing words or phrases given a source document, while keyphrase extraction (KE) aims to identify them from the text. Because the search space is much smaller in KE, it is often combined with KG to predict keyphrases that may or may not exist in the corresponding document. However, current unified approaches adopt sequence labeling and maximization-based generation that primarily operate at a token level, falling short in observing and scoring keyphrases as a whole. In this work, we propose SimCKP, a simple contrastive learning framework that consists of two stages: 1) An extractor-generator that extracts keyphrases by learning context-aware phrase-level representations in a contrastive manner while also generating keyphrases that do not appear in the document; 2) A reranker that adapts scores for each generated phrase by likewise aligning their representations with the corresponding document. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed approach, which outperforms the state-of-the-art models by a significant margin.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Select, extract and generate: Neural keyphrase generation with layer-wise coverage attention. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1389–1404, Online. Association for Computational Linguistics.
  2. Bi-lstm-crf sequence labeling for keyphrase extraction from scholarly documents. In The World Wide Web Conference, WWW ’19, page 2551–2557, New York, NY, USA. Association for Computing Machinery.
  3. Simple unsupervised keyphrase extraction using sentence embeddings. In Proceedings of the 22nd Conference on Computational Natural Language Learning, pages 221–229, Brussels, Belgium. Association for Computational Linguistics.
  4. TopicRank: Graph-based topic ranking for keyphrase extraction. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 543–551, Nagoya, Japan. Asian Federation of Natural Language Processing.
  5. Citation-enhanced keyphrase extraction from research papers: A supervised approach. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1435–1446, Doha, Qatar. Association for Computational Linguistics.
  6. Neural keyphrase generation via reinforcement learning with adaptive rewards. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2163–2174, Florence, Italy. Association for Computational Linguistics.
  7. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 1597–1607. PMLR.
  8. An integrated approach for keyphrase generation via exploring the power of retrieval and extraction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2846–2856, Minneapolis, Minnesota. Association for Computational Linguistics.
  9. Exclusive hierarchical decoding for deep keyphrase generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1095–1105, Online. Association for Computational Linguistics.
  10. Title-guided encoding for keyphrase generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6268–6275.
  11. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 539–546 vol. 1.
  12. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  13. SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  14. Learning dense representations for entity retrieval. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 528–537, Hong Kong, China. Association for Computational Linguistics.
  15. Incorporating expert knowledge into keyphrase extraction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31.
  16. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 2, pages 1735–1742.
  17. Anette Hulth. 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pages 216–223.
  18. SemEval-2010 task 5 : Automatic keyphrase extraction from scientific articles. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 21–26, Uppsala, Sweden. Association for Computational Linguistics.
  19. Applying graph-based keyword extraction to document retrieval. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 864–868, Nagoya, Japan. Asian Federation of Natural Language Processing.
  20. Large dataset for keyphrase extraction.
  21. RankGen: Improving text generation with large ranking models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 199–232, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  22. Learning rich representation of keyphrases from text. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 891–906, Seattle, United States. Association for Computational Linguistics.
  23. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
  24. Unsupervised keyphrase extraction by jointly modeling local and global context. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 155–164, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  25. Addressing extraction and generation separately: Keyphrase prediction with pre-trained language models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3180–3191.
  26. Yixin Liu and Pengfei Liu. 2021. SimCLS: A simple framework for contrastive learning of abstractive summarization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 1065–1072, Online. Association for Computational Linguistics.
  27. Automatic keyphrase extraction by bridging vocabulary gap. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pages 135–144, Portland, Oregon, USA. Association for Computational Linguistics.
  28. Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In International Conference on Learning Representations.
  29. Scientific information extraction with semi-supervised neural tagging. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2641–2651, Copenhagen, Denmark. Association for Computational Linguistics.
  30. Human-competitive tagging using automatic keyphrase extraction. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 1318–1327, Singapore. Association for Computational Linguistics.
  31. Topic indexing with wikipedia. In Proceedings of the AAAI WikiAI workshop, volume 1, pages 19–24.
  32. An empirical study on neural keyphrase generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4985–5007.
  33. Deep keyphrase generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 582–592, Vancouver, Canada. Association for Computational Linguistics.
  34. Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 404–411, Barcelona, Spain. Association for Computational Linguistics.
  35. Thuy Dung Nguyen and Min-Yen Kan. 2007. Keyphrase extraction in scientific publications. In Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers, pages 317–326, Berlin, Heidelberg. Springer Berlin Heidelberg.
  36. Ramakanth Pasunuru and Mohit Bansal. 2018. Multi-reward reinforced summarization with saliency and entailment. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 646–653, New Orleans, Louisiana. Association for Computational Linguistics.
  37. KPDROP: Improving absent keyphrase generation. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4853–4870, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  38. Importance Estimation from Multiple Perspectives for Keyphrase Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2726–2736, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  39. Learning to extract from multiple perspectives for neural keyphrase extraction. Computer Speech & Language, 81:101502.
  40. A contrastive framework for neural text generation. In Advances in Neural Information Processing Systems.
  41. Capturing global informativeness in open domain keyphrase extraction. In Natural Language Processing and Chinese Computing: 10th CCF International Conference, NLPCC 2021, Qingdao, China, October 13–17, 2021, Proceedings, Part II 10, pages 275–287. Springer.
  42. Sifrank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access, 8:10896–10906.
  43. A preliminary exploration of GANs for keyphrase generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8021–8030, Online. Association for Computational Linguistics.
  44. Neural machine translation with external phrase memory. CoRR, abs/1606.01792.
  45. SaSAKE: Syntax and semantics aware keyphrase extraction from research papers. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5372–5383, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  46. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579–2605.
  47. Xiaojun Wan and Jianguo Xiao. 2008. Single document keyphrase extraction using neighborhood knowledge. In AAAI, volume 8, pages 855–860.
  48. Ptr: Phrase-based topical ranking for automatic keyphrase extraction in scientific publications. In Neural Information Processing: 23rd International Conference, ICONIP 2016, Kyoto, Japan, October 16–21, 2016, Proceedings, Part IV 23, pages 120–128. Springer.
  49. Kea: practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, pages 254–255.
  50. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  51. Representation learning for resource-constrained keyphrase generation. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 700–716, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  52. UniKeyphrase: A unified extraction and generation framework for keyphrase prediction. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 825–835, Online. Association for Computational Linguistics.
  53. Fast and constrained absent keyphrase generation by prompt-based learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11495–11503.
  54. WR-One2Set: Towards well-calibrated keyphrase generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7283–7293, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  55. Semi-supervised learning for neural keyphrase generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4142–4153, Brussels, Belgium. Association for Computational Linguistics.
  56. Heterogeneous graph neural networks for keyphrase generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2705–2715, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  57. One2Set: Generating diverse keyphrases as a set. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4598–4608, Online. Association for Computational Linguistics.
  58. One size does not fit all: Generating and evaluating variable number of keyphrases. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7961–7975, Online. Association for Computational Linguistics.
  59. MDERank: A masked document embedding rank approach for unsupervised keyphrase extraction. In Findings of the Association for Computational Linguistics: ACL 2022, pages 396–409, Dublin, Ireland. Association for Computational Linguistics.
  60. Keyphrase extraction using deep recurrent neural networks on Twitter. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 836–845, Austin, Texas. Association for Computational Linguistics.
  61. Keyphrase generation via soft and hard semantic corrections. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7757–7768, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  62. SGG: Learning to select, guide, and generate for keyphrase generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5717–5726, Online. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Minseok Choi (35 papers)
  2. Chaeheon Gwak (2 papers)
  3. Seho Kim (3 papers)
  4. Si Hyeong Kim (1 paper)
  5. Jaegul Choo (161 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.