Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine Translation (2403.03521v1)

Published 6 Mar 2024 in cs.CL

Abstract: Neural machine translation (NMT) has progressed rapidly in the past few years, promising improvements and quality translations for different languages. Evaluation of this task is crucial to determine the quality of the translation. Overall, insufficient emphasis is placed on the actual sense of the translation in traditional methods. We propose a bidirectional semantic-based evaluation method designed to assess the sense distance of the translation from the source text. This approach employs the comprehensive multilingual encyclopedic dictionary BabelNet. Through the calculation of the semantic distance between the source and its back translation of the output, our method introduces a quantifiable approach that empowers sentence comparison on the same linguistic level. Factual analysis shows a strong correlation between the average evaluation scores generated by our method and the human assessments across various machine translation systems for English-German language pair. Finally, our method proposes a new multilingual approach to rank MT systems without the need for parallel corpora.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Subword pooling makes a difference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2284–2295, Online. Association for Computational Linguistics.
  2. David F. Crouse. 2016. On implementing 2d rectangular assignment algorithms. IEEE Transactions on Aerospace and Electronic Systems, 52(4):1679–1696.
  3. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  4. Edsger W Dijkstra. 1959. A note on two problems in connexion with graphs. Numerische mathematik, 1(1):269–271.
  5. Helge Dyvik. 1998. A translational basis for semantics, pages 51 – 86. Brill, Leiden, The Netherlands.
  6. Findings of the WMT 2019 shared tasks on quality estimation. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pages 1–10, Florence, Italy. Association for Computational Linguistics.
  7. Results of WMT22 metrics shared task: Stop using BLEU – neural metrics are better and more robust. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 46–68, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  8. Jerome H Friedman. 2002. Stochastic gradient boosting. Computational statistics & data analysis, 38(4):367–378.
  9. deepQuest: A framework for neural-based quality estimation. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3146–3157, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
  10. Marian: Fast neural machine translation in C++. In Proceedings of ACL 2018, System Demonstrations, pages 116–121, Melbourne, Australia. Association for Computational Linguistics.
  11. OpenKiwi: An open source framework for quality estimation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 117–122, Florence, Italy. Association for Computational Linguistics.
  12. Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, Brussels, Belgium. Association for Computational Linguistics.
  13. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  14. George A. Miller. 1992. WordNet: A lexical database for English. In Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992.
  15. Roberto Navigli and Simone Paolo Ponzetto. 2012. Babelnet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193:217–250.
  16. Why we need new evaluation metrics for NLG. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2241–2252, Copenhagen, Denmark. Association for Computational Linguistics.
  17. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  18. Mauve: Measuring the gap between neural text and human text using divergence frontiers.
  19. COMET: A neural framework for MT evaluation. CoRR, abs/2009.09025.
  20. CometKiwi: IST-unbabel 2022 submission for the quality estimation shared task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 634–645, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  21. Hadeel Saadany and Constantin Orasan. 2021. BLEU, METEOR, BERTScore: Evaluation of metrics performance in assessing critical translation errors in sentiment-oriented text. In Proceedings of the Translation and Interpreting Technology Online Conference, pages 48–56, Held Online. INCOMA Ltd.
  22. Michael Sussna. 1993. Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of the Second International Conference on Information and Knowledge Management, CIKM ’93, page 67–74, New York, NY, USA. Association for Computing Machinery.
  23. Jörg Tiedemann and Lars Nygaard. 2004. The opus corpus-parallel and free: http://logos. uio. no/opus. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04).
  24. Edge2vec: Edge-based social network embedding. ACM Transactions on Knowledge Discovery from Data, 14:1–24.
  25. Metagraph2vec: Complex semantic path augmented heterogeneous network embedding.
  26. Bertscore: Evaluating text generation with bert.
  27. MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 563–578, Hong Kong, China. Association for Computational Linguistics.
  28. Texygen: A benchmarking platform for text generation models. In The 41st international ACM SIGIR conference on research & development in information retrieval, pages 1097–1100.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Carinne Cherf (1 paper)
  2. Yuval Pinter (41 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com