Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging Information Bottleneck for Scientific Document Summarization (2110.01280v1)

Published 4 Oct 2021 in cs.CL

Abstract: This paper presents an unsupervised extractive approach to summarize scientific long documents based on the Information Bottleneck principle. Inspired by previous work which uses the Information Bottleneck principle for sentence compression, we extend it to document level summarization with two separate steps. In the first step, we use signal(s) as queries to retrieve the key content from the source document. Then, a pre-trained LLM conducts further sentence search and edit to return the final extracted summaries. Importantly, our work can be flexibly extended to a multi-view framework by different signals. Automatic evaluation on three scientific document datasets verifies the effectiveness of the proposed framework. The further human evaluation suggests that the extracted summaries cover more content aspects than previous systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jiaxin Ju (6 papers)
  2. Ming Liu (421 papers)
  3. Huan Yee Koh (10 papers)
  4. Yuan Jin (24 papers)
  5. Lan Du (46 papers)
  6. Shirui Pan (198 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.