2000 character limit reached
TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks (1906.01351v2)
Published 4 Jun 2019 in cs.CL
Abstract: Currently, no large-scale training data is available for the task of scientific paper summarization. In this paper, we propose a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences. We hypothesize that such talks constitute a coherent and concise description of the papers' content, and can form the basis for good summaries. We collected 1716 papers and their corresponding videos, and created a dataset of paper summaries. A model trained on this dataset achieves similar performance as models trained on a dataset of summaries created manually. In addition, we validated the quality of our summaries by human experts.
- Guy Lev (9 papers)
- Michal Shmueli-Scheuer (17 papers)
- Jonathan Herzig (34 papers)
- Achiya Jerbi (4 papers)
- David Konopnicki (16 papers)