Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automatic Summarization of Open-Domain Podcast Episodes (2011.04132v2)

Published 9 Nov 2020 in cs.CL

Abstract: We present implementation details of our abstractive summarizers that achieve competitive results on the Podcast Summarization task of TREC 2020. A concise textual summary that captures important information is crucial for users to decide whether to listen to the podcast. Prior work focuses primarily on learning contextualized representations. Instead, we investigate several less-studied aspects of neural abstractive summarization, including (i) the importance of selecting important segments from transcripts to serve as input to the summarizer; (ii) striking a balance between the amount and quality of training instances; (iii) the appropriate summary length and start/end points. We highlight the design considerations behind our system and offer key insights into the strengths and weaknesses of neural abstractive systems. Our results suggest that identifying important segments from transcripts to use as input to an abstractive summarizer is advantageous for summarizing long documents. Our best system achieves a quality rating of 1.559 judged by NIST evaluators---an absolute increase of 0.268 (+21%) over the creator descriptions.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Kaiqiang Song (32 papers)
  2. Chen Li (386 papers)
  3. Xiaoyang Wang (134 papers)
  4. Dong Yu (329 papers)
  5. Fei Liu (232 papers)
Citations (9)