Unity in Diversity: Learning Distributed Heterogeneous Sentence Representation for Extractive Summarization (1912.11688v1)

Published 25 Dec 2019 in cs.CL, cs.IR, and cs.LG

Abstract: Automated multi-document extractive text summarization is a widely studied research problem in the field of natural language understanding. Such extractive mechanisms compute in some form the worthiness of a sentence to be included into the summary. While the conventional approaches rely on human crafted document-independent features to generate a summary, we develop a data-driven novel summary system called HNet, which exploits the various semantic and compositional aspects latent in a sentence to capture document independent features. The network learns sentence representation in a way that, salient sentences are closer in the vector space than non-salient sentences. This semantic and compositional feature vector is then concatenated with the document-dependent features for sentence ranking. Experiments on the DUC benchmark datasets (DUC-2001, DUC-2002 and DUC-2004) indicate that our model shows significant performance gain of around 1.5-2 points in terms of ROUGE score compared with the state-of-the-art baselines.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (3)

Abhishek Kumar Singh (18 papers)
Manish Gupta (67 papers)
Vasudeva Varma (47 papers)

Citations (9)

View on Semantic Scholar

Unity in Diversity: Learning Distributed Heterogeneous Sentence Representation for Extractive Summarization (1912.11688v1)

Related Papers