Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles (2010.14235v1)

Published 27 Oct 2020 in cs.CL and cs.AI

Abstract: Multi-document summarization is a challenging task for which there exists little large-scale datasets. We propose Multi-XScience, a large-scale multi-document summarization dataset created from scientific articles. Multi-XScience introduces a challenging multi-document summarization task: writing the related-work section of a paper based on its abstract and the articles it references. Our work is inspired by extreme summarization, a dataset construction protocol that favours abstractive modeling approaches. Descriptive statistics and empirical results---using several state-of-the-art models trained on the Multi-XScience dataset---reveal that Multi-XScience is well suited for abstractive models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yao Lu (212 papers)
  2. Yue Dong (61 papers)
  3. Laurent Charlin (51 papers)
Citations (106)

Summary

We haven't generated a summary for this paper yet.