Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section (2408.16444v1)

Published 29 Aug 2024 in cs.CL

Abstract: Document summarization is a task to shorten texts into concise and informative summaries. This paper introduces a novel dataset designed for summarizing multiple scientific articles into a section of a survey. Our contributions are: (1) SurveySum, a new dataset addressing the gap in domain-specific summarization tools; (2) two specific pipelines to summarize scientific articles into a section of a survey; and (3) the evaluation of these pipelines using multiple metrics to compare their performance. Our results highlight the importance of high-quality retrieval stages and the impact of different configurations on the quality of generated summaries.

SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section

The paper "SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section" by Fernandes et al. presents an innovative contribution to the domain of text summarization. This work addresses a critical gap in domain-specific summarization tools by introducing the SurveySum dataset, specifically designed for summarizing multiple scientific articles into coherent sections of a survey.

Introduction and Problem Statement

Document summarization aims to distill extensive texts into concise, informative summaries. The significance of this task is elevated in the context of scientific literature, where the volume of publications necessitates efficient summarization for comprehensible and accessible synthesis. Traditional summarization methods include extractive and abstractive approaches, each with distinct methodologies and challenges.

The extension to Multi-Document Summarization (MDS) brings additional complexity, requiring the amalgamation of information from varied sources while maintaining coherence and eliminating redundancy. Existing datasets like Multi-News and Multi-XScience adopt this approach in non-scientific and scientific contexts, respectively. However, the authors identify a significant gap in datasets aimed at generating cohesive sections of scientific surveys, which are integral for researchers to capture state-of-the-art developments comprehensively.

Contributions

The authors address this gap through three primary contributions:

  1. SurveySum Dataset: This dataset is constructed by extracting sections from comprehensive surveys in artificial intelligence, natural language processing, and machine learning. These sections, along with the cited scientific articles, form the basis of the dataset, explicitly designed for the MDS task.
  2. Summarization Pipelines: Two specific pipelines are proposed for summarizing scientific articles into survey sections. These pipelines involve stages of document retrieval, chunking of text, and final summary generation using LLMs.
  3. Evaluation Framework: An extensive evaluation of the proposed pipelines using multiple metrics, providing a comparative analysis of their performance.

Methodology

The creation of SurveySum involves meticulously selecting comprehensive surveys based on predefined criteria, parsing these surveys to extract sections and their corresponding citations, and retrieving the full texts of these cited articles. This method ensures that the dataset encapsulates diverse topics while maintaining technical robustness.

Pipelines

Pipeline 1 employs a monoT5-3B model for retrieving text chunks and uses the gpt-3.5-turbo-0125 model to generate the final summaries. Three configurations were evaluated:

  • Pipeline 1.1: Summarization using 5 chunks.
  • Pipeline 1.2: Summarization using 10 chunks.
  • Pipeline 1.3: Utilizing articles retrieved from the Semantic Scholar API.

Pipeline 2 involves reranking text chunks using the SPECTER2 embeddings model and gpt-4-0125-preview:

  • Pipeline 2.1: Summarization using 1 chunk.
  • Pipeline 2.2: Summarization using 5 chunks.
  • Pipeline 2.3: Summarization using 10 chunks.
  • Pipeline 2.4: Utilizing gpt-4-0125-preview for reranking.
  • Pipeline 2.5: Utilizing gpt-4-0125-preview with 5 chunks.
  • Pipeline 2.6: Utilizing gpt-4-0125-preview with 10 chunks.

Evaluation and Results

The evaluation metrics employed include the References F1 Score, G-Eval, and Check-Eval. The results indicate a correlation between the quality of retrieval and the effectiveness of summarization. Notably, the pipeline configurations using articles from SurveySum outperformed those relying on Semantic Scholar retrieval in both G-Eval and Check-Eval scores. Moreover, setups utilizing the gpt-4-0125-preview model consistently yielded superior results compared to those using gpt-3.5-turbo-0125.

Implications and Future Work

The introduction of SurveySum and the proposed summarization pipelines provide a robust foundation for advancing MDS in the domain of scientific literature. The findings suggest that high-quality retrieval stages are crucial for generating coherent and accurate summaries. The differential performance of various LLMs underscores the importance of model selection in enhancing summarization quality.

Future research could explore the integration of more sophisticated retrieval mechanisms and the application of these pipelines in other scientific domains. Additionally, improving the granularity and interpretability of evaluation metrics would further augment the benchmarking of summarization models.

In summary, this paper offers a significant contribution to document summarization, particularly in the scientific domain, by addressing the unique challenges of summarizing multiple articles into coherent survey sections. The proposed methodologies and the SurveySum dataset lay the groundwork for future advancements in MDS, with practical implications for efficiently navigating and synthesizing the ever-expanding body of scientific literature.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Howsumm: A multi-document summarization dataset derived from wikihow articles, 2021.
  2. Language Models are Few-Shot Learners, 2020.
  3. SPECTER: Document-level Representation Learning using Citation-informed Transformers. In ACL, 2020.
  4. Multi-news: a large-scale multi-document summarization dataset and abstractive hierarchical model, 2019.
  5. A large-scale multi-document summarization dataset from the wikipedia current events portal, 2020.
  6. SumPubMed: Summarization dataset of PubMed scientific articles. In Jad Kabbara, Haitao Lin, Amandalynne Paullada, and Jannis Vamvas, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, pages 292–303, Online, August 2021. Association for Computational Linguistics.
  7. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2019.
  8. An empirical survey on long document summarization: Datasets, models, and metrics. ACM Computing Surveys, 55(8):1–35, December 2022.
  9. Generating a structured summary of numerous academic papers: Dataset and method. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-2022. International Joint Conferences on Artificial Intelligence Organization, July 2022.
  10. Long text and multi-table summarization: Dataset and method, 2023.
  11. G-eval: Nlg evaluation using gpt-4 with better human alignment. In Conference on Empirical Methods in Natural Language Processing, 2023.
  12. Multi-xscience: A large-scale dataset for extreme multi-document summarization of scientific articles, 2020.
  13. Automatic summarization. Foundations and Trends® in Information Retrieval, 5(2–3):103–233, 2011.
  14. Document ranking with a pretrained sequence-to-sequence model. In Trevor Cohn, Yulan He, and Yang Liu, editors, Findings of the Association for Computational Linguistics: EMNLP 2020, pages 708–718, Online, November 2020. Association for Computational Linguistics.
  15. Document expansion by query prediction, 2019.
  16. Check-eval: A checklist-based approach for evaluating text quality, 2024.
  17. Okapi at TREC-3. In Donna K. Harman, editor, Proceedings of The Third Text REtrieval Conference, TREC 1994, Gaithersburg, Maryland, USA, November 2-4, 1994, volume 500-225 of NIST Special Publication, pages 109–126. National Institute of Standards and Technology (NIST), 1994.
  18. Semantic Scholar. https://www.semanticscholar.org/.
  19. ScisummNet: A large annotated corpus and content-impact models for scientific paper summarization with citation networks. In Proceedings of AAAI 2019, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Leandro Carísio Fernandes (3 papers)
  2. Gustavo Bartz Guedes (2 papers)
  3. Thiago Soares Laitz (1 paper)
  4. Thales Sales Almeida (10 papers)
  5. Rodrigo Nogueira (70 papers)
  6. Roberto Lotufo (41 papers)
  7. Jayr Pereira (10 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com