Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Human Evaluation of English--Irish Transformer-Based NMT (2403.02366v1)

Published 4 Mar 2024 in cs.CL and cs.AI

Abstract: In this study, a human evaluation is carried out on how hyperparameter settings impact the quality of Transformer-based Neural Machine Translation (NMT) for the low-resourced English--Irish pair. SentencePiece models using both Byte Pair Encoding (BPE) and unigram approaches were appraised. Variations in model architectures included modifying the number of layers, evaluating the optimal number of heads for attention and testing various regularisation techniques. The greatest performance improvement was recorded for a Transformer-optimized model with a 16k BPE subword model. Compared with a baseline Recurrent Neural Network (RNN) model, a Transformer-optimized model demonstrated a BLEU score improvement of 7.8 points. When benchmarked against Google Translate, our translation engines demonstrated significant improvements. Furthermore, a quantitative fine-grained manual evaluation was conducted which compared the performance of machine translation systems. Using the Multidimensional Quality Metrics (MQM) error taxonomy, a human evaluation of the error types generated by an RNN-based system and a Transformer-based system was explored. Our findings show the best-performing Transformer system significantly reduces both accuracy and fluency errors when compared with an RNN-based model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Dual learning for machine translation. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016; Volume 29.
  2. Augmenting neural machine translation through round-trip training approach. Open Comput. Sci. 2019, 9, 268–278.
  3. SMT versus NMT: Preliminary comparisons for Irish. In Proceedings of the AMTA 2018 Workshop on Technologies for MT of Low Resource Languages (LoResMT 2018), Boston, MA, USA, 21 March 2018; pp. 12–20.
  4. A call for prudent choice of subword merge operations in neural machine translation. arXiv 2019, arXiv:1905.10453.
  5. Finding the optimal vocabulary size for neural machine translation. arXiv 2020, arXiv:2004.02334.
  6. Gage, P. A new algorithm for data compression. C Users J. 1994, 12, 23–38.
  7. Neural machine translation of rare words with subword units. arXiv 2015, arXiv:1508.07909.
  8. Kudo, T. Subword regularization: Improving neural network translation models with multiple subword candidates. arXiv 2018, arXiv:1804.10959.
  9. Transformers for Low-Resource Languages: Is Féidir Linn! In Proceedings of the 18th Biennial Machine Translation Summit (Volume 1: Research Track), Virtual, 16–20 August 2021; pp. 48–60.
  10. Responses to language barriers in consultations with refugees and asylum seekers: A telephone survey of Irish general practitioners. BMC Fam. Pract. 2008, 9, 1–6.
  11. The digital divide and social inclusion among refugee migrants. Inf. Technol. People 2015, 28, 344–365.
  12. Pivot machine translation using chinese as pivot language. In Proceedings of the China Workshop on Machine Translation, Wuyishan, China, 25–26 October 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 74–85.
  13. Deep reconstruction-classification networks for unsupervised domain adaptation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 597–613.
  14. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473.
  15. On the properties of neural machine translation: Encoder-decoder approaches. arXiv 2014, arXiv:1409.1259.
  16. Findings of the 2017 Conference on Machine Translation (WMT17). In Proceedings of the Second Conference on Machine Translation, Copenhagen, Denmark, 7–11 September 2017; Association for Computational Linguistics: Copenhagen, Denmark, 2017; pp. 169–214. doi:\changeurlcolorblack10.18653/v1/W17-4717.
  17. Findings of the 2018 Conference on Machine Translation (WMT18). In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, Belgium, Brussels, 31 October–1 November 2018; Association for Computational Linguistics: Belgium, Brussels, 2018; pp. 272–303. doi:\changeurlcolorblack10.18653/v1/W18-6401.
  18. Informing the use of hyperparameter optimization through metalearning. In Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, USA, 18–21 November 2017; IEEE: New York, NY, USA, 2017; pp. 1051–1056.
  19. Montgomery, D.C. Design and Analysis of Experiments; John Wiley & Sons: Hoboken, NJ, USA, 2017.
  20. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305.
  21. Attention is all you need. arXiv 2017, arXiv:1706.03762.
  22. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv 2018, arXiv:1808.06226.
  23. Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval). In Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval), Kiev, Ukraine, 19–20 April 2021; Association for Computational Linguistics (ACL): Stroudsburg, PA, USA, 2021.
  24. Attaining the unattainable? reassessing claims of human parity in neural machine translation. arXiv 2018, arXiv:1808.10432.
  25. Is neural machine translation the new state of the art? Prague Bull. Math. Linguist. 2017, 108, 109.
  26. Evaluating machine translation in a low-resource language combination: Spanish-Galician. In Proceedings of the Machine Translation Summit XVII: Translator, Project and User Tracks, Dublin, Ireland, 19–23 August 2019; pp. 30–35. Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval).
  27. Exploiting out-of-domain parallel data through multilingual transfer learning for low-resource neural machine translation. arXiv 2019, arXiv:1907.03060.
  28. A set of recommendations for assessing human–machine parity in language translation. J. Artif. Intell. Res. 2020, 67, 653–672.
  29. A human evaluation of English-Irish statistical and neural machine translation. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, Lisbon, Portugal, 3–5 November 2020; pp. 431–440.
  30. Optimizing Transformer for Low-Resource Neural Machine Translation. arXiv 2020, arXiv:2011.02266.
  31. On optimal transformer depth for low-resource language translation. arXiv 2020, arXiv:2004.04418.
  32. Quantitative fine-grained human evaluation of machine translation systems: A case study on English to Croatian. Mach. Transl. 2018, 32, 195–215.
  33. Experts, errors, and context: A large-scale study of human evaluation for machine translation. Trans. Assoc. Comput. Linguist. 2021, 9, 1460–1474.
  34. Blend: A novel combined MT metric based on direct assessment—CASICT-DCU submission to WMT17 metrics task. In Proceedings of the Second Conference on Machine Translation, Copenhagen, Denmark, 7–11 September 2017, pp. 598–603.
  35. Lommel, A. Metrics for translation quality assessment: A case for standardising error typologies. In Translation Quality Assessment; Springer: Berlin/Heidelberg, Germany, 2018; pp. 109–127.
  36. Using a new analytic measure for the annotation and analysis of MT errors on real data. In Proceedings of the 17th Annual conference of the European Association for Machine Translation, Dubrovnik, Croatia, 16–18 June 2014; pp. 165–172.
  37. (Meta-) evaluation of machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, 23 June 2007; pp. 136–158.
  38. Artstein, R. Inter-annotator agreement. In Handbook of Linguistic Annotation; Springer: Berlin/Heidelberg, Germany, 2017;pp. 297–313.
  39. Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46.
  40. DGT-TM: A freely available translation memory in 22 languages. arXiv 2013, arXiv:1309.5226.
  41. Bisong, E. Google colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform; Springer: Berlin/Heidelberg, Germany, 2019; pp. 59–64.
  42. Opennmt: Open-source toolkit for neural machine translation. arXiv 2017, arXiv:1701.02810.
  43. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 7–12 July 2002; pp. 311–318.
  44. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, Cambridge, MA, USA, 8–12 August 2006; Citeseer: Forest Grove, OR, USA, 2006; Volume 200.
  45. Popović, M. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisboa, Portugal, 17–18 September 2015; pp. 392–395.
  46. Results of the WMT15 metrics shared task. In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisboa, Portugal, 17–18 September 2015; pp. 256–273.
  47. Knowledge Distillation: A Method for Making Neural Machine Translation More Efficient. Information 2022, 13, 88.
  48. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual, 3–10 March 2021;pp. 610–623.
  49. Quantifying the carbon emissions of machine learning. arXiv 2019, arXiv:1910.09700.
  50. SEAI. Sustainable Energy in Ireland; SEAI: Dublin, Ireland, 2020.
  51. Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource Languages. arXiv 2021, arXiv:2108.06598.
  52. Machine Translation in the Covid domain: an English-Irish case study for LoResMT 2021. In Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021), Virtual, 16 August 2021; pp. 144–150.
  53. McHugh, M.L. Interrater reliability: The kappa statistic. Biochem. Medica 2012, 22, 276–282.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Séamus Lankford (17 papers)
  2. Haithem Afli (13 papers)
  3. Andy Way (46 papers)
Citations (8)