Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization (2109.04098v1)

Published 9 Sep 2021 in cs.CL

Abstract: Abstractive text summarization is one of the areas influenced by the emergence of pre-trained LLMs. Current pre-training works in abstractive summarization give more points to the summaries with more words in common with the main text and pay less attention to the semantic similarity between generated sentences and the original document. We propose ARMAN, a Transformer-based encoder-decoder model pre-trained with three novel objectives to address this issue. In ARMAN, salient sentences from a document are selected according to a modified semantic score to be masked and form a pseudo summary. To summarize more accurately and similar to human writing patterns, we applied modified sentence reordering. We evaluated our proposed models on six downstream Persian summarization tasks. Experimental results show that our proposed model achieves state-of-the-art performance on all six summarization tasks measured by ROUGE and BERTScore. Our models also outperform prior works in textual entailment, question paraphrasing, and multiple choice question answering. Finally, we established a human evaluation and show that using the semantic score significantly improves summarization results.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Alireza Salemi (21 papers)
  2. Emad Kebriaei (3 papers)
  3. Ghazal Neisi Minaei (1 paper)
  4. Azadeh Shakery (26 papers)
Citations (3)