Simpson's Paradox and the Accuracy-Fluency Tradeoff in Translation (2402.12690v2)
Abstract: A good translation should be faithful to the source and should respect the norms of the target language. We address a theoretical puzzle about the relationship between these objectives. On one hand, intuition and some prior work suggest that accuracy and fluency should trade off against each other, and that capturing every detail of the source can only be achieved at the cost of fluency. On the other hand, quality assessment researchers often suggest that accuracy and fluency are highly correlated and difficult for human raters to distinguish (Callison-Burch et al., 2007). We show that the tension between these views is an instance of Simpson's paradox, and that accuracy and fluency are positively correlated at the level of the corpus but trade off at the level of individual source segments. We further suggest that the relationship between accuracy and fluency is best evaluated at the segment (or sentence) level, and that the trade off between these dimensions has implications both for assessing translation quality and developing improved MT systems.
- Fabio Alves and José Luiz Gonçalves. 2013. Investigating the conceptual-procedural distinction in the translation process: A relevance-theoretic analysis of micro and macro translation units. Target. International Journal of Translation Studies, 25(1):107–124.
- Adequacy–fluency metrics: Evaluating mt in the continuous space model framework. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(3):472–482.
- Findings of the 2016 conference on machine translation (wmt16). In First conference on machine translation, pages 131–198. Association for Computational Linguistics.
- The mathematics of statistical machine translation: Parameter estimation.
- (meta-) evaluation of machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation, pages 136–158.
- English-to-Japanese translation vs. dictation vs. post-editing: Comparing translation modes in a multilingual setting. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 4024–4031.
- Michael Carl and M Cristina Toledo Báez. 2019. Machine translation errors and the translation process: A study across different languages. Journal of Specialised Translation, 31:107–132.
- The CRITT translation process research database. In New directions in empirical translation process research, pages 13–54. Springer.
- Approaches to human and machine translation quality assessment. Translation quality assessment: From principles to practice, pages 9–38.
- No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672.
- Ali Darwish. 2008. Optimality in translation. Writescope Publishers.
- Gabriel Armand Djiako. 2019. Lexical ambiguity in machine translation and its impact on the evaluation of output by users. Ph.D. thesis, Saarländische Universitäts-und Landesbibliothek.
- Barbara Dragsted. 2010. Coordination of reading and writing processes in translation: An eye on uncharted territory. In Translation and Cognition, pages 41–62. John Benjamins Publishing Company.
- Beyond english-centric multilingual machine translation. Journal of Machine Learning Research, 22(107):1–48.
- Findings of the 2021 conference on machine translation (wmt21). In Proceedings of the Sixth Conference on Machine Translation, pages 1–88. Association for Computational Linguistics.
- Ana Frankenberg-Garcia. 2022. Can a corpus-driven lexical analysis of human and machine translation unveil discourse features that set them apart? Target, 34(2):278–308.
- Experts, errors, and context: A large-scale study of human evaluation for machine translation. Transactions of the Association for Computational Linguistics, 9:1460–1474.
- Results of wmt23 metrics shared task: Metrics might be guilty but references are not innocent. In Proceedings of the Eighth Conference on Machine Translation, pages 578–628.
- Results of the wmt21 metrics shared task: Evaluating metrics with expert-based human evaluations on ted and news domain. In Proceedings of the Sixth Conference on Machine Translation, pages 733–774.
- Effects of l1 syntax on l2 translation. Copenhagen Studies in Language, 38:319–336.
- Findings of the 2023 conference on machine translation (wmt23): Llms are here but not quite there yet. In Proceedings of the Eighth Conference on Machine Translation, pages 1–42.
- Findings of the 2022 conference on machine translation (wmt22). In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 1–45.
- Maria Kunilovskaya. 2023. Translationese indicators for human translation quality estimation (based on English-to-Russian translation of mass-media texts). Ph.D. thesis, University of Wolverhampton.
- Marianna Martindale and Marine Carpuat. 2018. Fluency over adequacy: A pilot study in measuring user trust in imperfect mt. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), pages 13–25.
- Identifying fluently inadequate output in neural and statistical machine translation. In Proceedings of Machine Translation Summit XVII: Research Track, pages 233–243.
- Nikita Mathur. 2021. Robustness in Machine Translation Evaluation. Ph.D. thesis, University of Melbourne.
- Bartolomé Mesa-Lao. 2014. Gaze behaviour on source texts: An exploratory study comparing translation and post-editing. In Post-editing of machine translation: Processes and applications, pages 219–245. Cambridge Scholars Publishing.
- Domain robustness in neural machine translation. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), pages 151–164.
- Jean Nitzke. 2019. Problem solving activities in post-editing and translation from scratch: A multi-method study. Language Science Press.
- Dagmara Płońska. 2016. Problems of literality in french-polish translations of a newspaper article. New directions in empirical translation process research: exploring the CRITT TPR-DB, pages 279–291.
- Thierry Poibeau. 2022. On “human parity” and “super human performance” in machine translation evaluation. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6018–6023.
- Maja Popović. 2020. Relations between comprehensibility and adequacy errors in machine translation output. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 256–264.
- Comet-22: Unbabel-ist 2022 submission for the metrics shared task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 578–585.
- Cohesive relations in text comprehension and production: An exploratory study comparing translation and post-editing. New Directions in Empirical Translation Process Research: Exploring the CRITT TPR-DB, pages 239–263.
- Bleurt: Learning robust metrics for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7881–7892.
- Annette Camilla Sjørup. 2013. Cognitive effort in metaphor translation: An eye-tracking and key-logging study. Frederiksberg: Copenhagen Business School (CBS).
- Predicting machine translation adequacy. In Proceedings of Machine Translation Summit XIII: Papers.
- Semantic structural decomposition for neural machine translation. In Proceedings of the ninth joint conference on lexical and computational semantics, pages 50–57.
- Translation, information theory and cognition. The Routledge Handbook of Translation and Cognition, pages 9781315178127–24.
- Bram Vanroy. 2021. Syntactic difficulties in translation. Ph.D. thesis, Ghent University.
- Mihaela Vela and Liling Tan. 2015. Predicting machine translation adequacy with document embeddings. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 402–410.
- Translating science fiction in a CAT tool: Machine translation and segmentation settings. Translation & Interpreting, 15(1):216–235.
- Simple and effective noisy channel modeling for neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5696–5701.
- The neural noisy channel. In International Conference on Learning Representations.
- Better document-level machine translation with bayes’ rule. Transactions of the Association for Computational Linguistics, 8:346–360.
- Simpson’s bias in nlp training. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14276–14283.
- Findings of the wmt 2022 shared task on quality estimation. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 69–99.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.