2000 character limit reached
MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset (2010.04480v3)
Published 9 Oct 2020 in cs.CL
Abstract: We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains eleven language pairs, with human labels for up to 10,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well as titles of the articles where the sentences were extracted from, and the neural MT models used to translate the text.
- Marina Fomicheva (11 papers)
- Shuo Sun (91 papers)
- Erick Fonseca (3 papers)
- Chrysoula Zerva (20 papers)
- Frédéric Blain (10 papers)
- Vishrav Chaudhary (45 papers)
- Francisco Guzmán (39 papers)
- Nina Lopatina (4 papers)
- Lucia Specia (68 papers)
- André F. T. Martins (113 papers)