Gradable ChatGPT Translation Evaluation (2401.09984v2)
Abstract: ChatGPT, as a LLM based on large-scale pre-training, has exerted a profound influence on the domain of machine translation. In ChatGPT, a "Prompt" refers to a segment of text or instruction employed to steer the model towards generating a specific category of response. The design of the translation prompt emerges as a key aspect that can wield influence over factors such as the style, precision and accuracy of the translation to a certain extent. However, there is a lack of a common standard and methodology on how to design and select a translation prompt. Accordingly, this paper proposes a generic taxonomy, which defines gradable translation prompts in terms of expression type, translation style, POS information and explicit statement, thus facilitating the construction of prompts endowed with distinct attributes tailored for various translation tasks. Specific experiments and cases are selected to validate and illustrate the effectiveness of the method.
- 2022. Attribute injection for pretrained language models: A new benchmark and an efficient method. En Proceedings of the 29th International Conference on Computational Linguistics, páginas 1051–1064, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- 2020. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
- 2020. Enhanced neural machine translation by joint decoding with word and pos-tagging sequences. Mobile Networks and Applications, 25(5):1722–1728, oct.
- 2022. Complexity-based prompting for multi-step reasoning. En The Eleventh International Conference on Learning Representations.
- 2023. How does chatgpt perform on the united states medical licensing examination? the implications of large language models for medical education and knowledge assessment. JMIR Medical Education, 9(1):e45312.
- 2022. The flores-101 evaluation benchmark for low-resource and multilingual machine translation. Transactions of the Association for Computational Linguistics, 10:522–538.
- 2020. Incorporating bert into parallel sequence decoding with adapters. Advances in Neural Information Processing Systems, 33:10843–10854.
- 2022. Tencent ai lab-shanghai jiao tong university low-resource translation system for the wmt22 translation task. En Proceedings of the Seventh Conference on Machine Translation (WMT), páginas 260–267, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- 2023. How good are gpt models at machine translation? a comprehensive evaluation. arXiv preprint arXiv:2302.09210.
- 2022. Improving neural machine translation with pos-tag features for low-resource language pairs. Heliyon, 8(8).
- 2022. Discovering the syntax and strategies of natural language programming with generative language models. En Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, CHI ’22, New York, NY, USA. Association for Computing Machinery.
- 2023. Is chatgpt a good translator? yes with gpt-4 as the engine. arXiv preprint arXiv:2301.08745.
- Karmaker, S. K. y D. Feng. 2023. Teler: A general taxonomy of llm prompts for benchmarking complex tasks. arXiv preprint arXiv:2305.11430.
- 2023. Language models can solve computer tasks. arXiv preprint arXiv:2303.17491.
- Kocmi, T. y C. Federmann. 2023. Large language models are state-of-the-art evaluators of translation quality. En M. Nurminen J. Brenner M. Koponen S. Latomaa M. Mikhailov F. Schierl T. Ranasinghe E. Vanmassenhove S. A. Vidal N. Aranberri M. Nunziatini C. P. Escartín M. Forcada M. Popovic C. Scarton, y H. Moniz, editores, Proceedings of the 24th Annual Conference of the European Association for Machine Translation, páginas 193–203, Tampere, Finland, Junio. European Association for Machine Translation.
- 2022. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
- 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
- Li, X. 1991. Integral optimisation of translations. Foreign Languages Research, 04:54–59.
- 2022. Prompt-driven neural machine translation. En Findings of the Association for Computational Linguistics: ACL 2022, páginas 2579–2590, Dublin, Ireland, Mayo. Association for Computational Linguistics.
- Lin, C.-Y. 2004. ROUGE: A package for automatic evaluation of summaries. En Text Summarization Branches Out, páginas 74–81, Barcelona, Spain, Julio. Association for Computational Linguistics.
- 2023. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35.
- 2019. Incorporating word and subword units in unsupervised machine translation using language model rescoring. En Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), páginas 275–282, Florence, Italy, Agosto. Association for Computational Linguistics.
- 2023a. Toward human-like evaluation for natural language generation with error analysis. En A. Rogers J. Boyd-Graber, y N. Okazaki, editores, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), páginas 5892–5907, Toronto, Canada, Julio. Association for Computational Linguistics.
- 2023b. Error analysis prompting enables human-like translation evaluation in large language models: A case study on chatgpt. arXiv preprint arXiv:2303.13809.
- 2023. A preliminary evaluation of chatgpt for zero-shot dialogue understanding. arXiv preprint arXiv:2304.04256.
- 2002. Bleu: a method for automatic evaluation of machine translation. En Proceedings of the 40th annual meeting of the Association for Computational Linguistics, páginas 311–318.
- Popescu-Belis, A. 2019. Context in neural machine translation: A review of models and evaluations. arXiv preprint arXiv:1901.09115.
- Popović, M. 2015. chrF: character n-gram F-score for automatic MT evaluation. En O. Bojar R. Chatterjee C. Federmann B. Haddow C. Hokamp M. Huck V. Logacheva, y P. Pecina, editores, Proceedings of the Tenth Workshop on Statistical Machine Translation, páginas 392–395, Lisbon, Portugal, Septiembre. Association for Computational Linguistics.
- 2022. Measuring and narrowing the compositionality gap in language models. arXiv preprint arXiv:2210.03350.
- 2006. A study of translation edit rate with targeted human annotation. En Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, páginas 223–231, Cambridge, Massachusetts, USA, Agosto 8-12. Association for Machine Translation in the Americas.
- 2022. MSP: Multi-stage prompting for making pre-trained language models better translators. En Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), páginas 6131–6142, Dublin, Ireland, Mayo. Association for Computational Linguistics.
- 2023. Large language models in medicine. Nature medicine, 29(8):1930–1940.
- 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- 2022. Legal prompt engineering for multilingual legal judgement prediction. arXiv preprint arXiv:2212.02199.
- 2023. Towards understanding chain-of-thought prompting: An empirical study of what matters. En Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), páginas 2717–2739, Toronto, Canada, Julio. Association for Computational Linguistics.
- 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Xiao, X. 2010. Towards the three principles for translation of english advertising texts. Journal of Jiangxi University of Finance and Economics, páginas 81–85.
- 2020. A survey of deep learning techniques for neural machine translation. arXiv preprint arXiv:2002.07526.
- 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
- 2021. Calibrate before use: Improving few-shot performance of language models. En International Conference on Machine Learning, páginas 12697–12706. PMLR.
- 2022. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910.
- Hui Jiao (1 paper)
- Bei Peng (34 papers)
- Lu Zong (10 papers)
- Xiaojun Zhang (20 papers)
- Xinwei Li (43 papers)