Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation (2005.01196v3)

Published 3 May 2020 in cs.CL

Abstract: Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual textual similarity. In this paper, we concern ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations, which represents a natural adversarial setup for multilingual encoders. Reference-free evaluation holds the promise of web-scale comparison of MT systems. We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER. We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations, namely, (a) a semantic mismatch between representations of mutual translations and, more prominently, (b) the inability to punish "translationese", i.e., low-quality literal translations. We propose two partial remedies: (1) post-hoc re-alignment of the vector spaces and (2) coupling of semantic-similarity based metrics with target-side LLMing. In segment-level MT evaluation, our best metric surpasses reference-based BLEU by 5.7 correlation points.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Wei Zhao (309 papers)
  2. Goran Glavaš (82 papers)
  3. Maxime Peyrard (33 papers)
  4. Yang Gao (761 papers)
  5. Robert West (154 papers)
  6. Steffen Eger (90 papers)
Citations (62)