Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sequence-to-Sequence Resources for Catalan (2202.06871v1)

Published 14 Feb 2022 in cs.CL and cs.AI

Abstract: In this work, we introduce sequence-to-sequence language resources for Catalan, a moderately under-resourced language, towards two tasks, namely: Summarization and Machine Translation (MT). We present two new abstractive summarization datasets in the domain of newswire. We also introduce a parallel Catalan-English corpus, paired with three different brand new test sets. Finally, we evaluate the data presented with competing state of the art models, and we develop baselines for these tasks using a newly created Catalan BART. We release the resulting resources of this work under open license to encourage the development of language technology in Catalan.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ona de Gibert (10 papers)
  2. Ksenia Kharitonova (2 papers)
  3. Blanca Calvo Figueras (3 papers)
  4. Jordi Armengol-Estapé (22 papers)
  5. Maite Melero (9 papers)