Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Quality Estimation without Human-labeled Data (2102.04020v1)

Published 8 Feb 2021 in cs.CL

Abstract: Quality estimation aims to measure the quality of translated content without access to a reference translation. This is crucial for machine translation systems in real-world scenarios where high-quality translation is needed. While many approaches exist for quality estimation, they are based on supervised machine learning requiring costly human labelled data. As an alternative, we propose a technique that does not rely on examples from human-annotators and instead uses synthetic training data. We train off-the-shelf architectures for supervised quality estimation on our synthetic data and show that the resulting models achieve comparable performance to models trained on human-annotated data, both for sentence and word-level prediction.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yi-Lin Tuan (18 papers)
  2. Ahmed El-Kishky (25 papers)
  3. Adithya Renduchintala (17 papers)
  4. Vishrav Chaudhary (45 papers)
  5. Francisco Guzmán (39 papers)
  6. Lucia Specia (68 papers)
Citations (24)