Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scaling up COMETKIWI: Unbabel-IST 2023 Submission for the Quality Estimation Shared Task (2309.11925v1)

Published 21 Sep 2023 in cs.CL

Abstract: We present the joint contribution of Unbabel and Instituto Superior T\'ecnico to the WMT 2023 Shared Task on Quality Estimation (QE). Our team participated on all tasks: sentence- and word-level quality prediction (task 1) and fine-grained error span detection (task 2). For all tasks, we build on the COMETKIWI-22 model (Rei et al., 2022b). Our multilingual approaches are ranked first for all tasks, reaching state-of-the-art performance for quality estimation at word-, span- and sentence-level granularity. Compared to the previous state-of-the-art COMETKIWI-22, we show large improvements in correlation with human judgements (up to 10 Spearman points). Moreover, we surpass the second-best multilingual submission to the shared-task with up to 3.8 absolute points.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (12)
  1. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
  2. InfoXLM: An information-theoretic framework for cross-lingual language model pre-training. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3576–3588, Online. Association for Computational Linguistics.
  3. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
  4. The devil is in the errors: Leveraging large language models for fine-grained machine translation evaluation. ArXiv, abs/2308.07286.
  5. MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset. In Proceedings of the Language Resources and Evaluation Conference, pages 4963–4974, Marseille, France. European Language Resources Association.
  6. Andre Martins and Ramon Astudillo. 2016. From softmax to sparsemax: A sparse model of attention and multi-label classification. In International Conference on Machine Learning, pages 1614–1623.
  7. OpenAI. 2023. Gpt-4 technical report.
  8. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana. Association for Computational Linguistics.
  9. Searching for COMETINHO: The little metric that could. In Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, pages 61–70, Ghent, Belgium. European Association for Machine Translation.
  10. CometKiwi: IST-unbabel 2022 submission for the quality estimation shared task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 634–645, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  11. Quality estimation for machine translation. Synthesis Lectures on Human Language Technologies, 11(1):1–162.
  12. UniTE: Unified translation evaluation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8117–8127, Dublin, Ireland. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Ricardo Rei (34 papers)
  2. Nuno M. Guerreiro (27 papers)
  3. José Pombal (15 papers)
  4. Daan van Stigt (3 papers)
  5. Marcos Treviso (17 papers)
  6. Luisa Coheur (33 papers)
  7. José G. C. de Souza (12 papers)
  8. André F. T. Martins (113 papers)
Citations (44)