Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Comparison of Translationese in Machine Translation and Human Transation in terms of Translation Relations (2404.08661v1)

Published 27 Mar 2024 in cs.CL

Abstract: This study explores the distinctions between neural machine translation (NMT) and human translation (HT) through the lens of translation relations. It benchmarks HT to assess the translation techniques produced by an NMT system and aims to address three key research questions: the differences in overall translation relations between NMT and HT, how each utilizes non-literal translation techniques, and the variations in factors influencing their use of specific non-literal techniques. The research employs two parallel corpora, each spanning nine genres with the same source texts with one translated by NMT and the other by humans. Translation relations in these corpora are manually annotated on aligned pairs, enabling a comparative analysis that draws on linguistic insights, including semantic and syntactic nuances such as hypernyms and alterations in part-of-speech tagging. The results indicate that NMT relies on literal translation significantly more than HT across genres. While NMT performs comparably to HT in employing syntactic non-literal translation techniques, it falls behind in semantic-level performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (126)
  1. Nora Aranberri. Can translationese features help users select an mt system for post-editing? Procesamiento del Lenguaje Natural, 64:93–100, 2020.
  2. Identifying translationese at the word and sub-word level. Digital Scholarship in the Humanities, 31(1):30–54, 2016.
  3. Mona Baker. Corpus linguistics and translation studies implications and applications. Text and Technology: In honour of John Sinclair, page 233, 1993.
  4. Mona Baker. Corpora in translation studies: An overview and some suggestions for future research. Target. International Journal of Translation Studies, 7(2):223–243, 1995.
  5. Mona Baker. Corpus-based translation studies: The challenges that lie ahead. In Terminology, LSP and Translation, page 175. John Benjamins, 1996.
  6. Mona Baker. A corpus-based view of similarity and difference in translation. International journal of corpus linguistics, 9(2):167–193, 2004.
  7. A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing, 21(3):259–274, 2006.
  8. Neural versus phrase-based mt quality: An in-depth analysis on english–german and english–french. Computer Speech & Language, 49:52–70, 2018.
  9. How human is machine translationese? comparing human and machine translations of text and speech. In Proceedings of the 17th International Conference on Spoken Language Translation, pages 280–290, 2020.
  10. Findings of the 2016 conference on machine translation. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pages 131–198, 2016.
  11. A linguistic evaluation of rule-based, phrase-based, and neural mt engines. The Prague Bulletin of Mathematical Linguistics, 108(1):159, 2017.
  12. (meta-) evaluation of machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation, pages 136–158, 2007.
  13. Re-evaluating the role of bleu in machine translation research. In 11th conference of the european chapter of the association for computational linguistics, pages 249–256, 2006.
  14. Is neural machine translation the new state of the art? The Prague Bulletin of Mathematical Linguistics, (108), 2017.
  15. Baobao Chang. Chinese-english parallel corpus construction and its application. In Proceedings of The 18th Pacific Asia Conference on Language, Information and Computation, pages 283–290, 2004.
  16. Eirini Chatzikoumi. How to evaluate machine translation: A review of automated and human metrics. Natural Language Engineering, 26(2):137–161, 2020.
  17. Early acquisition of verbs in korean: A cross-linguistic study. Journal of child language, 22(3):497–529, 1995.
  18. Approche linguistique des problèmes de traduction anglais-français. Editions Ophrys, 1987.
  19. Deborah Coughlin. Correlating automated and human assessments of machine translation quality. In Proceedings of Machine Translation Summit IX: Papers, 2003.
  20. Tesla at wmt 2011: Translation evaluation and tunable metric. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 78–84, 2011.
  21. Can transformer be too compositional? analysing idiom processing in neural machine translation. arXiv preprint arXiv:2205.15301, 2022.
  22. Carmen Dayrell. A quantitative approach to compare collocational patterns in translated and non-translated texts. International Journal of Corpus Linguistics, 12(3):375–414, 2007.
  23. Uncovering machine translationese using corpus analysis techniques to distinguish between original and machine-translated french. Translation Quarterly, (101):21–45, 2021.
  24. Translation divergences in chinese–english machine translation: An empirical investigation. Computational Linguistics, 43(3):521–565, 2017.
  25. What’s your pick: Rbmt, smt or hybrid? In Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Commercial MT User Program, 2012.
  26. P EuroMatrix. 1.3: Survey of machine translation evaluation. EuroMatrix Project Report, Statistical and Hybrid MT between All European Languages, co-ordinator: Prof. Hans Uszkoreit, 2007.
  27. Examining the tip of the iceberg: A data set for idiom translation. arXiv preprint arXiv:1802.04681, 2018.
  28. Christian Federmann. Appraise: An open-source toolkit for manual phrase-based evaluation of translations. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), 2010.
  29. William Frawley. Translation: Literary, linguistic, and philosophical perspectives. Newark: University of Delaware Press; London: Associated University Presses, 1984.
  30. Automatic classification of human translation and machine translation: A study from the perspective of lexical diversity. arXiv preprint arXiv:2105.04616, 2021.
  31. Using dependency-based contextualization for transferring passive constructions from english to spanish. Procesamiento del Lenguaje Natural, 66:53–64, 2021.
  32. Martin Gellerstam. Translationese in swedish novels translated from english. Translation studies in Scandinavia, 1:88–95, 1986.
  33. Martin Gellerstam. Translations as a source for cross-linguistic studies. Lund studies in English, 88:53–62, 1996.
  34. Ulrich Germann. Yawat: yet another word alignment tool. In Proceedings of the ACL-08: HLT demo session, pages 20–23, 2008.
  35. Accurate evaluation of segment-level machine translation metrics. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1183–1191, 2015.
  36. Continuous measurement scales in human evaluation of machine translation. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pages 33–41, 2013.
  37. Domain adaptation with latent semantic association for named entity recognition. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 281–289, 2009.
  38. Machine translation evaluation with neural networks. Computer Speech & Language, 45:180–200, 2017.
  39. Achieving human parity on automatic chinese to english news translation. arXiv preprint arXiv:1803.05567, 2018.
  40. Detecting syntactic features of translated chinese. arXiv preprint arXiv:1804.08756, 2018.
  41. W John Hutchins. Machine translation: A brief history. In Concise history of the language sciences, pages 431–445. Elsevier, 1995.
  42. Identification of translationese: A machine learning approach. In International conference on intelligent text processing and computational linguistics, pages 503–511. Springer, 2010.
  43. Novel noun and verb learning in chinese-, english-, and japanese-speaking children. Child development, 79(4):979–1000, 2008.
  44. A challenge set approach to evaluating machine translation. arXiv preprint arXiv:1704.07431, 2017.
  45. Even-Zohar Itamar. Polysystem studies. Poetics Today, 11(1):1–268, 1990.
  46. R. Jääskeläinen. Tapping the Process: An Explorative Study of the Cognitive and Affective Factors Involved in Translating. Joensuun Yliopiston humanistisia julkaisuja: Joensuun Yliopisto. Joensuun yliopisto, 1999.
  47. Post-editing neural machine translation versus phrase-based machine translation for english–chinese. Machine Translation, 33(1):9–29, 2019.
  48. Early lexical development in english-and korean-speaking children: Language-general and language-specific patterns. Journal of child language, 27(2):225–254, 2000.
  49. Fine-grained human evaluation of neural versus phrase-based machine translation. The Prague Bulletin of Mathematical Linguistics, 108(1):121, 2017.
  50. Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872, 2017.
  51. A product and process analysis of post-editor corrections on neural, statistical and rule-based machine translation output. Machine Translation, 33(1):61–90, 2019.
  52. Hans P Krings. Translation problems and translation strategies of advanced. Interlingual and intercultural communication: Discourse and cognition in translation and second language acquisition studie, pages 263–272, 1986.
  53. Chen-li Kuo. Function words in statistical machine-translated chinese and original chinese: A study into the translationese of machine translation systems. Digital Scholarship in the Humanities, 34(4):752–771, 2019.
  54. Piotr Kwieciñski. Translation strategies in a rapidly transforming culture: A central european perspective. The translator, 4(2):183–206, 1998.
  55. Has machine translation achieved human parity? a case for document-level evaluation. arXiv preprint arXiv:1808.07048, 2018.
  56. Susanne Lauscher. Translation quality assessment: Where can theory and practice meet? The translator, 6(2):149–168, 2000.
  57. Alon Lavie. Evaluating the output of machine translation systems. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Tutorials, 2010.
  58. Sara Laviosa. Core patterns of lexical use in a comparable corpus of english narrative prose. Meta: journal des traducteurs/Meta: Translators’ Journal, 43(4):557–570, 1998.
  59. Vladimir I Levenshtein et al. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pages 707–710. Soviet Union, 1966.
  60. Punctuation as implicit annotations for chinese word segmentation. Computational Linguistics, 35(4):505–512, 2009.
  61. Ken Lin. The elimination of translationese from the perspective of functional equivalence. 2019.
  62. Yang Liu. Tsinghuaaligner: A statistical bilingual word alignment system. 2015.
  63. Contrastive unsupervised word alignment with non-local features. In Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
  64. Chi-kiu Lo. Yisi-a unified semantic mt quality evaluation and estimation metric for languages with different levels of available resources. In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 507–513, 2019.
  65. Adam Lopez. Statistical machine translation. ACM Computing Surveys (CSUR), 40(3):1–49, 2008.
  66. Idioms in state-of-the-art croatian-english and english-croatian smt systems. In 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pages 1546–1550. IEEE, 2017.
  67. Muc-7 evaluation of ie technology: Overview of results. In Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29-May 1, 1998, 1998.
  68. Translation techniques revisited: A dynamic and functionalist approach. Meta: Journal des Traducteurs/Meta: Translators’ Journal, 47(4):498–512, 2002.
  69. Peter Newmark. Approaches to translation (Language Teaching methodology senes). Oxford: Pergamon Press, 1981.
  70. Peter Newmark. Pragmatic translation and literalism. TTR: traduction, terminologie, rédaction, 1(2):133–145, 1988.
  71. Peter Newmark. More paragraphs on translation. Multilingual matters, 1998.
  72. Providing syntactic awareness to neural machine translation by graph-based transformer. In International Conference on Artificial Intelligence and Big Data in Digital Era, pages 73–83. Springer, 2022.
  73. Improving neural machine translation with amr semantic graphs. Mathematical Problems in Engineering, 2021, 2021.
  74. Eugene Albert Nida. Toward a science of translating: with special reference to principles and procedures involved in Bible translating. Brill Archive, 1964.
  75. Improving smt quality with morpho-syntactic analysis. In COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics, 2000.
  76. A clustering approach for translationese identification. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pages 532–538, 2013.
  77. Improved statistical alignment models. In Proceedings of the 38th annual meeting of the association for computational linguistics, pages 440–447, 2000.
  78. Tamkiko Ogura. Meishi yuui, doushi yuui ni oyobosu hahaoya no gengo nyuryoku no kentou. Influence of maternal input in the distribution of nouns and verbs in early vocabulary.) Ministery of Education and Science grant report. Kobe University, 2001.
  79. Maeve Olohan. Spelling out the optionals in translation: a corpus study. UCREL technical papers, 13:423–432, 2001.
  80. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
  81. Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals. Nature communications, 11(1):1–15, 2020.
  82. Maja Popovic. Hjerson: An open source tool for automatic error classification of machine translation output. Prague Bull. Math. Linguistics, 96:59–68, 2011.
  83. Maja Popović. chrf: character n-gram f-score for automatic mt evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392–395, 2015.
  84. The nist 2008 metrics for machine translation challenge—overview, methodology, metrics, and results. Machine Translation, 23(2):71–103, 2009.
  85. Tiina Puurtinen. Explicitating and implicitating source text ideology. Across Languages and Cultures, 4(1):53–62, 2003.
  86. Unsupervised identification of translationese. Transactions of the Association for Computational Linguistics, 3:419–432, 2015.
  87. Weaknesses of translation result using google translate. In STRUKTURAL 2020: Proceedings of the 2nd International Seminar on Translation Studies, Applied Linguistics, Literature and Cultural Studies, STRUKTURAL 2020, 30 December 2020, Semarang, Indonesia, page 111. European Alliance for Innovation, 2021.
  88. Juan C Sager. Quality and standards: The evaluation of translations. The translator’s handbook, 2:91–102, 1989.
  89. Diana Santos. On grammatical translationese. In Short papers presented at the Tenth Scandinavian Conference on Computational Linguistics, pages 59–66, 1995.
  90. Alina Secară. Translation evaluation-a state of the art survey. In Proceedings of the eCoLoRe/MeLLANGE Workshop, pages 39–44. Citeseer, 2005.
  91. Larry Selinker. Interlanguage. 1972.
  92. Blum-Kulka Sh. Shifts of cohesion and coherence in translation. Interlingual and Intercultural Communication. Discourse and Cognition in Translation and Second Language Acquisition Studies, pages 17–35, 1986.
  93. Evaluating machine translation performance on chinese idioms with a blacklist method. arXiv preprint arXiv:1711.07646, 2017.
  94. Miriam Shlesinger. Interpreter latitude vs. due process. simultaneous and consecutive interpretation in multilingual trials. Empirical research in translation and intercultural studies, pages 147–155, 1991.
  95. Comparison of translation techniques by google translate and u-dictionary: How differently does both machine translation tools perform in translating? Elsya: Journal of English Language Studies, 3(3):236–245, 2021.
  96. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 223–231, 2006.
  97. Quest-a translation quality estimation framework. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 79–84, 2013.
  98. Twila Tardif. Nouns are not always learned before verbs: Evidence from mandarin speakers’ early vocabularies. Developmental psychology, 32(3):492, 1996.
  99. Scate taxonomy and corpus of machine translation errors. Trends in E-tools and resources for translators and interpreters, pages 219–244, 2017.
  100. Um-corpus: A large english-chinese parallel corpus for statistical machine translation. In Proceedings of the ninth international conference on language resources and evaluation (LREC’14), pages 1837–1842, 2014.
  101. Sonja Tirkkonen-Condit. Translationese—a myth or an empirical fact?: A study into the linguistic identifiability of translated language. Target. International Journal of Translation Studies, 14(2):207–220, 2002.
  102. Antonio Toral. Reassessing claims of human parity and super-human performance in machine translation at wmt 2019. arXiv preprint arXiv:2005.05738, 2020.
  103. A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. arXiv preprint arXiv:1701.02901, 2017.
  104. Gideon Toury. What are descriptive studies into translation likely to yield apart from isolated descriptions. In Translation studies: The state of the art, pages 179–192. Brill, 1991.
  105. Gideon Toury. Descriptive translation studies: And beyond. Descriptive Translation Studies, pages 1–366, 2012.
  106. A fine-grained error analysis of nmt, pbmt and rbmt output for english-to-dutch. In Eleventh International Conference on Language Resources and Evaluation, pages 3799–3804. European Language Resources Association (ELRA), 2018.
  107. Kitty Van Leuven-Zwart. Translation and original: Similarities and dissimilarities, ii. Target. International Journal of Translation Studies, 2(1):69–95, 1990.
  108. R. Vanderauwera. Dutch Novels Translated Into English: The Transformation of a "Minority" Literature. Approaches to translation studies. Rodopi, 1985.
  109. Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation. arXiv preprint arXiv:2102.00287, 2021.
  110. Lost in translation: Loss and decay of linguistic richness in machine translation. arXiv preprint arXiv:1906.12068, 2019.
  111. Metrics of syntactic equivalence to assess translation difficulty. In Explorations in empirical translation process research, pages 259–294. Springer, 2021.
  112. The translation studies reader, volume 216. Routledge London, 2000.
  113. Skopos and commission in translational action. In The translation studies reader, pages 219–230. Routledge, 2021.
  114. J.P. Vinay and J. Darbelnet. Stylistique comparée du français et de l’anglais: méthode de traduction. Bibliothèque de stylistique comparée. Didier, 1958.
  115. On the features of translationese. Digital Scholarship in the Humanities, 30(1):98–118, 2015.
  116. The automatic translation of idioms. machine translation vs. translation memory systems. Sprachwissenschaft, Computerlinguistik und neue Medien, (1):167–192, 1998.
  117. Challenges of neural machine translation for short texts. Computational Linguistics, 48(2):321–342, 2022.
  118. Progress in machine translation. Engineering, 2021.
  119. Gutenberg goes neural: Comparing features of dutch human translations with raw neural machine translation outputs in a corpus of english literary classics. In Informatics, volume 7, page 32. MDPI, 2020.
  120. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
  121. A survey of deep learning techniques for neural machine translation. arXiv preprint arXiv:2002.07526, 2020.
  122. Annotation guidelines of translation techniques for english-chinese.
  123. Building an english-chinese parallel corpus annotated with sub-sentential translation techniques. In Proceedings of the 12th language resources and evaluation conference, pages 4024–4033, 2020.
  124. Construction of a multilingual corpus annotated with translation relations. In First Workshop on Linguistic Resources for Natural Language Processing, pages 102–111, 2018.
  125. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
  126. The united nations parallel corpus v1. 0. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 3530–3534, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Fan Zhou (111 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com