Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Unifying Multi-Lingual and Cross-Lingual Summarization (2305.09220v1)

Published 16 May 2023 in cs.CL and cs.AI

Abstract: To adapt text summarization to the multilingual world, previous work proposes multi-lingual summarization (MLS) and cross-lingual summarization (CLS). However, these two tasks have been studied separately due to the different definitions, which limits the compatible and systematic research on both of them. In this paper, we aim to unify MLS and CLS into a more general setting, i.e., many-to-many summarization (M2MS), where a single model could process documents in any language and generate their summaries also in any language. As the first step towards M2MS, we conduct preliminary studies to show that M2MS can better transfer task knowledge across different languages than MLS and CLS. Furthermore, we propose Pisces, a pre-trained M2MS model that learns LLMing, cross-lingual ability and summarization ability via three-stage pre-training. Experimental results indicate that our Pisces significantly outperforms the state-of-the-art baselines, especially in the zero-shot directions, where there is no training data from the source-language documents to the target-language summaries.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jiaan Wang (35 papers)
  2. Fandong Meng (174 papers)
  3. Duo Zheng (13 papers)
  4. Yunlong Liang (33 papers)
  5. Zhixu Li (43 papers)
  6. Jianfeng Qu (17 papers)
  7. Jie Zhou (687 papers)
Citations (19)