SumTra: A Differentiable Pipeline for Few-Shot Cross-Lingual Summarization (2403.13240v1)

Published 20 Mar 2024 in cs.CL

Abstract: Cross-lingual summarization (XLS) generates summaries in a language different from that of the input documents (e.g., English to Spanish), allowing speakers of the target language to gain a concise view of their content. In the present day, the predominant approach to this task is to take a performing, pretrained multilingual LLM (LM) and fine-tune it for XLS on the language pairs of interest. However, the scarcity of fine-tuning samples makes this approach challenging in some cases. For this reason, in this paper we propose revisiting the summarize-and-translate pipeline, where the summarization and translation tasks are performed in a sequence. This approach allows reusing the many, publicly-available resources for monolingual summarization and translation, obtaining a very competitive zero-shot performance. In addition, the proposed pipeline is completely differentiable end-to-end, allowing it to take advantage of few-shot fine-tuning, where available. Experiments over two contemporary and widely adopted XLS datasets (CrossSum and WikiLingua) have shown the remarkable zero-shot performance of the proposed approach, and also its strong few-shot performance compared to an equivalent multilingual LM baseline, that the proposed approach has been able to outperform in many languages with only 10% of the fine-tuning samples.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (42)

Authors (3)

Jacob Parnell (4 papers)
Inigo Jauregi Unanue (13 papers)
Massimo Piccardi (21 papers)

Citations (2)

View on Semantic Scholar

SumTra: A Differentiable Pipeline for Few-Shot Cross-Lingual Summarization (2403.13240v1)

Related Papers