Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CNTLS: A Benchmark Dataset for Abstractive or Extractive Chinese Timeline Summarization (2105.14201v2)

Published 29 May 2021 in cs.AI, cs.IR, and cs.LG

Abstract: Timeline summarization (TLS) involves creating summaries of long-running events using dated summaries from numerous news articles. However, limited data availability has significantly slowed down the development of timeline summarization. In this paper, we introduce the CNTLS dataset, a versatile resource for Chinese timeline summarization. CNTLS encompasses 77 real-life topics, each with 2524 documents and summarizes nearly 60\% days duration compression on average all topics. We meticulously analyze the corpus using well-known metrics, focusing on the style of the summaries and the complexity of the summarization task. Specifically, we evaluate the performance of various extractive and generative summarization systems on the CNTLS corpus to provide benchmarks and support further research. To the best of our knowledge, CNTLS is the first Chinese timeline summarization dataset. The dataset and source code are released\footnote{Code and data available at: \emph{\url{https://github.com/OpenSUM/CNTLS}}.}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Qianren Mao (13 papers)
  2. Jiazheng Wang (17 papers)
  3. Zheng Wang (401 papers)
  4. Xi Li (199 papers)
  5. Bo Li (1108 papers)
  6. Jianxin Li (128 papers)

Summary

We haven't generated a summary for this paper yet.