Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Extractive and Abstractive Neural Document Summarization with Transformer Language Models (1909.03186v2)

Published 7 Sep 2019 in cs.CL

Abstract: We present a method to produce abstractive summaries of long documents that exceed several thousand words via neural abstractive summarization. We perform a simple extractive step before generating a summary, which is then used to condition the transformer LLM on relevant information before being tasked with generating a summary. We show that this extractive step significantly improves summarization results. We also show that this approach produces more abstractive summaries compared to prior work that employs a copy mechanism while still achieving higher rouge scores. Note: The abstract above was not written by the authors, it was generated by one of the models presented in this paper.

An Examination of Extractive and Abstractive Neural Document Summarization with Transformer LLMs

The paper investigates the application of transformer LLMs (TLMs) for the task of document summarization, distinguishing between extractive and abstractive approaches. The authors propose a methodology that incorporates both extractive and abstractive elements to enhance the efficacy of summarization for extended textual inputs such as scientific articles.

Methodology

The authors initially address the challenge of handling extensive documents through an innovative two-step strategy:

  1. Extractive Step: Two hierarchical models, namely a sentence pointer network and a sentence classifier, are utilized to identify and extract the most salient sentences from the document. This pragmatic approach serves to condense the document, focusing the transformer model's attention on pertinent content.
  2. Abstractive Step: The extracted sentences condition a transformer LLM to generate a coherent and concise summary. This stage leverages a transformer architecture akin to GPT-like models, which, unlike traditional seq2seq models, do not explicitly divide the problem into encoding and decoding tasks.

Empirical Results

The proposed method was validated against several large datasets, including arXiv, PubMed, and bigPatent, demonstrating superior performance compared to existing extractive and abstractive methods. Noteworthy findings include:

  • The TLM conditioned on extracted information achieved higher ROUGE scores relative to prior methods, illustrating improved summarization efficacy.
  • The use of a transformer model without a copying mechanism resulted in summaries that are more abstractive in nature, with minimal copying of verbatim phrases from the original document.

Implications and Future Directions

From a practical perspective, the proposed approach offers enhanced summarization capabilities, which are particularly beneficial for domains requiring processing of long documents such as academic publishing and patent documentation. Theoretically, this work illustrates the potential of combining extractive techniques as a preliminary step to improve the focus and relevance of the contextual input for transformer models in generation tasks.

Future research directions might explore end-to-end training paradigms that tightly integrate extractive and abstractive components, potentially improving efficiency and summary quality. Additionally, there is the open challenge of ensuring factual correctness in generated summaries, particularly significant in scientific contexts where inaccuracies could have substantial ramifications.

In conclusion, this paper contributes to the growing understanding of how transformer architectures can be adapted and applied to long-document summarization tasks, showcasing their ability to produce high-quality, concise, and abstractive summaries when appropriately conditioned on key document elements using extractive methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Sandeep Subramanian (24 papers)
  2. Raymond Li (24 papers)
  3. Jonathan Pilault (15 papers)
  4. Christopher Pal (97 papers)
Citations (198)
X Twitter Logo Streamline Icon: https://streamlinehq.com