Towards General Error Diagnosis via Behavioral Testing in Machine Translation (2310.13362v1)
Abstract: Behavioral testing offers a crucial means of diagnosing linguistic errors and assessing capabilities of NLP models. However, applying behavioral testing to machine translation (MT) systems is challenging as it generally requires human efforts to craft references for evaluating the translation quality of such systems on newly generated test cases. Existing works in behavioral testing of MT systems circumvent this by evaluating translation quality without references, but this restricts diagnosis to specific types of errors, such as incorrect translation of single numeric or currency words. In order to diagnose general errors, this paper proposes a new Bilingual Translation Pair Generation based Behavior Testing (BTPGBT) framework for conducting behavioral testing of MT systems. The core idea of BTPGBT is to employ a novel bilingual translation pair generation (BTPG) approach that automates the construction of high-quality test cases and their pseudoreferences. Experimental results on various MT systems demonstrate that BTPGBT could provide comprehensive and accurate behavioral testing results for general error diagnosis, which further leads to several insightful findings. Our code and data are available at https: //github.com/wujunjie1998/BTPGBT.
- Boris Beizer. 1995. Black-box testing: techniques for functional testing of software and systems. John Wiley & Sons, Inc.
- Findings of the 2017 conference on machine translation (WMT17). In Proceedings of the Second Conference on Machine Translation, pages 169–214, Copenhagen, Denmark. Association for Computational Linguistics.
- Mask-align: Self-supervised neural word alignment. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4781–4791, Online. Association for Computational Linguistics.
- Lexically constrained neural machine translation with explicit alignment guidance. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 12630–12638. AAAI Press.
- Enabling language models to fill in the blanks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2492–2501, Online. Association for Computational Linguistics.
- Detecting word sense disambiguation biases in machine translation for model-agnostic adversarial attacks. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7635–7653, Online. Association for Computational Linguistics.
- Hallucinations in large multilingual translation models. ArXiv preprint, abs/2303.16104.
- Machine translation testing via pathological invariance. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 863–875.
- Structure-invariant testing for machine translation. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pages 961–973.
- Testing machine translation via referential transparency. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 410–422. IEEE.
- How good are gpt models at machine translation? a comprehensive evaluation. ArXiv preprint, abs/2302.09210.
- Transmart: A practical interactive machine translation system. arXiv preprint arXiv:2105.13072.
- Automated testing for machine translation via constituency invariance. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 468–479. IEEE.
- Is chatgpt a good translator? a preliminary study. ArXiv preprint, abs/2301.08745.
- Kevin Knight. 2000. Statistical machine translation. In Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: Tutorial Descriptions, Cuernavaca, Mexico. Springer.
- Klaus Krippendorff. 2011. Computing krippendorff’s alpha-reliability.
- Generating authentic adversarial examples beyond meaning-preserving with doubly round-trip translation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4256–4266, Seattle, United States. Association for Computational Linguistics.
- BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
- Towards making the most of chatgpt for machine translation. ArXiv preprint, abs/2303.13780.
- Is chatgpt a general-purpose natural language processing task solver? ArXiv preprint, abs/2302.06476.
- SALTED: A framework for SAlient long-tail translation error detection. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5163–5179, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- COMET-22: Unbabel-IST 2022 submission for the metrics shared task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 578–585, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Beyond accuracy: Behavioral testing of NLP models with CheckList. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4902–4912, Online. Association for Computational Linguistics.
- HateCheck: Functional tests for hate speech detection models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 41–58, Online. Association for Computational Linguistics.
- Danielle Saunders and Bill Byrne. 2020. Reducing gender bias in neural machine translation as a domain adaptation problem. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7724–7736, Online. Association for Computational Linguistics.
- OPPO’s machine translation systems for WMT20. In Proceedings of the Fifth Conference on Machine Translation, pages 282–292, Online. Association for Computational Linguistics.
- Automatic testing and improvement of machine translation. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pages 974–985.
- What do you see in this patient? behavioral testing of clinical NLP models. In Proceedings of the 4th Clinical Natural Language Processing Workshop, pages 63–73, Seattle, WA. Association for Computational Linguistics.
- Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5998–6008.
- Measuring and mitigating name biases in neural machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2576–2590, Dublin, Ireland. Association for Computational Linguistics.
- As easy as 1, 2, 3: Behavioural testing of NMT systems for numerical translation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4711–4717, Online. Association for Computational Linguistics.
- A template-based method for constrained neural machine translation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3665–3679, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- BiTIIMT: A bilingual text-infilling method for interactive machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1958–1969, Dublin, Ireland. Association for Computational Linguistics.
- Crafting adversarial examples for neural machine translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1967–1977, Online. Association for Computational Linguistics.
- Text infilling. ArXiv preprint, abs/1901.00158.
- Junjie Wu (74 papers)
- Lemao Liu (62 papers)
- Dit-Yan Yeung (78 papers)