Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Open Domain Web Keyphrase Extraction Beyond Language Modeling (1911.02671v1)

Published 6 Nov 2019 in cs.CL and cs.IR

Abstract: This paper studies keyphrase extraction in real-world scenarios where documents are from diverse domains and have variant content quality. We curate and release OpenKP, a large scale open domain keyphrase extraction dataset with near one hundred thousand web documents and expert keyphrase annotations. To handle the variations of domain and content quality, we develop BLING-KPE, a neural keyphrase extraction model that goes beyond language understanding using visual presentations of documents and weak supervision from search queries. Experimental results on OpenKP confirm the effectiveness of BLING-KPE and the contributions of its neural architecture, visual features, and search log weak supervision. Zero-shot evaluations on DUC-2001 demonstrate the improved generalization ability of learning from the open domain data compared to a specific domain.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Lee Xiong (3 papers)
  2. Chuan Hu (11 papers)
  3. Chenyan Xiong (95 papers)
  4. Daniel Campos (62 papers)
  5. Arnold Overwijk (9 papers)
Citations (59)

Summary

We haven't generated a summary for this paper yet.