Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks (1906.01351v2)

Published 4 Jun 2019 in cs.CL

Abstract: Currently, no large-scale training data is available for the task of scientific paper summarization. In this paper, we propose a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences. We hypothesize that such talks constitute a coherent and concise description of the papers' content, and can form the basis for good summaries. We collected 1716 papers and their corresponding videos, and created a dataset of paper summaries. A model trained on this dataset achieves similar performance as models trained on a dataset of summaries created manually. In addition, we validated the quality of our summaries by human experts.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Guy Lev (9 papers)
  2. Michal Shmueli-Scheuer (17 papers)
  3. Jonathan Herzig (34 papers)
  4. Achiya Jerbi (4 papers)
  5. David Konopnicki (16 papers)
Citations (49)

Summary

We haven't generated a summary for this paper yet.