Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts (2405.11804v1)

Published 20 May 2024 in cs.CL
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts

Abstract: Recent advancements in machine translation (MT) have significantly enhanced translation quality across various domains. However, the translation of literary texts remains a formidable challenge due to their complex language, figurative expressions, and cultural nuances. In this work, we introduce a novel multi-agent framework based on LLMs for literary translation, implemented as a company called TransAgents, which mirrors traditional translation publication process by leveraging the collective capabilities of multiple agents, to address the intricate demands of translating literary works. To evaluate the effectiveness of our system, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP). MHP assesses translations from the perspective of monolingual readers of the target language, while BLP uses advanced LLMs to compare translations directly with the original texts. Empirical findings indicate that despite lower d-BLEU scores, translations from TransAgents are preferred by both human evaluators and LLMs over human-written references, particularly in genres requiring domain-specific knowledge. We also highlight the strengths and limitations of TransAgents through case studies and suggests directions for future research.

TransAgents: A Multi-Agent System for Literary Translation

Introduction

Literary translation is often cited as one of the most demanding tasks in the field of machine translation (MT). This complexity arises from the need to preserve figurative language, cultural references, and stylistic elements. In response to this challenge, a fresh approach was introduced called TransAgents, a multi-agent system designed specifically for literary translations. This article will help unpack the main ideas behind this innovative framework.

The Multi-Agent Setup

TransAgents operates like a virtual company, employing various "agents" to tackle different aspects of translating a literary work, much like a traditional publishing house. Let's break down the key points:

  1. Roles and Responsibilities:
    • Senior and Junior Editors: Oversee the translation process, ensuring the end product aligns with the original text's style and tone.
    • Translators and Localization Specialists: Convert the text while adapting it to the target culture.
    • Proofreaders: Critically review the text to ensure linguistic accuracy.
  2. Collaboration Strategies:
    • Addition-by-Subtraction Collaboration: Two agents work in tandem—one adds as much detail as possible and the other trims unnecessary parts.
    • Trilateral Collaboration: Involves three agents each with specific roles—one generates content, one critiques it, and another makes final judgments on quality.

Novel Evaluation Methods

Assessing literary translations isn't as straightforward as evaluating technical documents. Standard metrics like BLEU often fall short. Therefore, TransAgents employs two innovative methods:

  1. Monolingual Human Preference (MHP): Human readers who do not understand the source language evaluate translations to see which version resonates better in terms of readability, fluidity, and cultural appropriateness.
  2. Bilingual LLM Preference (BLP): Advanced LLMs compare the translations directly against the original texts, focusing on maintaining the essence of the source material.

Results and Performance

Interestingly, while TransAgents achieved lower BLEU scores, it was favored by both human evaluators and LLMs over translations by human references, particularly in genres like historical contexts and cultural nuances. Here are some key takeaways:

  • Preference Results: TransAgents' translations were preferred over both human and other machine-generated translations. For instance, in BLP evaluations, TransAgents outperformed by a noticeable margin.
  • Linguistic Diversity: TransAgents excelled in preserving the richness and diversity of the language, producing more vivid and engaging translations.
  • Cost Efficiency: TransAgents significantly reduced translation costs—by approximately 80 times—compared to traditional human translators.

Strengths and Limitations

Strengths:

  • High Preference Scores: Despite lower BLEU scores, human judges and LLMs preferred TransAgents' outputs.
  • Cultural Adaptation: The system successfully adapted texts culturally, improving reader engagement.

Limitations:

  • Content Omission: Both TransAgents and other models experienced issues with content omission. Further refinement is needed to ensure no vital content is lost.
  • Consistency: Ensuring consistency across chapters remains a challenging task.

Implications for AI and Future Research

The introduction of multi-agent systems like TransAgents opens new avenues for applying AI in complex linguistic tasks. Here are a few thoughts on future developments:

  • Enhanced Modeling: Optimizing agent roles and improving their integration could further enhance translation quality.
  • Adaptive Evaluation Metrics: Developing more sophisticated metrics that capture the subjective and nuanced nature of literary texts will be essential.
  • Scalability and Versatility: Expanding the system's capabilities to handle other forms of creative writing, such as scripts or poetry, could be tremendously beneficial.

Conclusion

TransAgents demonstrates the potential of multi-agent systems in tackling the nuanced challenges of literary translation. While the system shows promising results in terms of human and AI preferences, it also highlights areas where improvements are necessary. Future research and development could build on these insights to create even more sophisticated translation tools, leveraging the collective intelligence of collaborative AI agents.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (117)
  1. Gemini: A family of highly capable multimodal models. CoRR, abs/2312.11805, 2023a. doi: 10.48550/ARXIV.2312.11805. URL https://doi.org/10.48550/arXiv.2312.11805.
  2. Palm 2 technical report. CoRR, abs/2305.10403, 2023b. doi: 10.48550/ARXIV.2305.10403. URL https://doi.org/10.48550/arXiv.2305.10403.
  3. Qwen technical report. CoRR, abs/2309.16609, 2023a. doi: 10.48550/ARXIV.2309.16609. URL https://doi.org/10.48550/arXiv.2309.16609.
  4. Longbench: A bilingual, multitask benchmark for long context understanding. CoRR, abs/2308.14508, 2023b. doi: 10.48550/ARXIV.2308.14508. URL https://doi.org/10.48550/arXiv.2308.14508.
  5. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  6. Aljoscha Burchardt. Multidimensional quality metrics: a flexible system for assessing translation quality. In Proceedings of Translating and the Computer 35, London, UK, November 28-29 2013. Aslib. URL https://aclanthology.org/2013.tc-1.6.
  7. Chateval: Towards better llm-based evaluators through multi-agent debate. CoRR, abs/2308.07201, 2023. doi: 10.48550/ARXIV.2308.07201. URL https://doi.org/10.48550/arXiv.2308.07201.
  8. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Alessandro Moschitti, Bo Pang, and Walter Daelemans (eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  1724–1734, Doha, Qatar, October 2014. Association for Computational Linguistics. doi: 10.3115/v1/D14-1179. URL https://aclanthology.org/D14-1179.
  9. Palm: Scaling language modeling with pathways. CoRR, abs/2204.02311, 2022. doi: 10.48550/arXiv.2204.02311. URL https://doi.org/10.48550/arXiv.2204.02311.
  10. Scaling instruction-finetuned language models. CoRR, abs/2210.11416, 2022. doi: 10.48550/ARXIV.2210.11416. URL https://doi.org/10.48550/arXiv.2210.11416.
  11. Seamlessm4t-massively multilingual & multimodal machine translation. CoRR, abs/2308.11596, 2023. doi: 10.48550/ARXIV.2308.11596. URL https://doi.org/10.48550/arXiv.2308.11596.
  12. No language left behind: Scaling human-centered machine translation. CoRR, abs/2207.04672, 2022. doi: 10.48550/ARXIV.2207.04672. URL https://doi.org/10.48550/arXiv.2207.04672.
  13. Cutting the gordian knot: The moving-average type–token ratio (mattr). Journal of quantitative linguistics, 17(2):94–100, 2010.
  14. Qlora: Efficient finetuning of quantized llms. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (eds.), Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023. URL http://papers.nips.cc/paper_files/paper/2023/hash/1feb87871436031bdc0f2beaa62a049b-Abstract-Conference.html.
  15. Context-aware cross-attention for non-autoregressive translation. In Donia Scott, Nuria Bel, and Chengqing Zong (eds.), Proceedings of the 28th International Conference on Computational Linguistics, pp.  4396–4402, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. doi: 10.18653/v1/2020.coling-main.389. URL https://aclanthology.org/2020.coling-main.389.
  16. Self-collaboration code generation via chatgpt. CoRR, abs/2304.07590, 2023. doi: 10.48550/ARXIV.2304.07590. URL https://doi.org/10.48550/arXiv.2304.07590.
  17. Improving factuality and reasoning in language models through multiagent debate. CoRR, abs/2305.14325, 2023a. doi: 10.48550/ARXIV.2305.14325. URL https://doi.org/10.48550/arXiv.2305.14325.
  18. On extrapolation of long-text translation with large language models. 2023b.
  19. Length-controlled alpacaeval: A simple way to debias automatic evaluators. arXiv preprint arXiv:2404.04475, 2024.
  20. KTO: model alignment as prospect theoretic optimization. CoRR, abs/2402.01306, 2024. doi: 10.48550/ARXIV.2402.01306. URL https://doi.org/10.48550/arXiv.2402.01306.
  21. Beyond english-centric multilingual machine translation. J. Mach. Learn. Res., 22:107:1–107:48, 2021. URL http://jmlr.org/papers/v22/20-1307.html.
  22. Learn to remember: Transformer with recurrent memory for document-level machine translation. In Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz (eds.), Findings of the Association for Computational Linguistics: NAACL 2022, pp.  1409–1420, Seattle, United States, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-naacl.105. URL https://aclanthology.org/2022.findings-naacl.105.
  23. BLEU might be guilty but references are not innocent. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  61–71, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.5. URL https://aclanthology.org/2020.emnlp-main.5.
  24. Experts, errors, and context: A large-scale study of human evaluation for machine translation. Transactions of the Association for Computational Linguistics, 9:1460–1474, 2021. doi: 10.1162/tacl˙a˙00437. URL https://aclanthology.org/2021.tacl-1.87.
  25. Results of WMT22 metrics shared task: Stop using BLEU – neural metrics are better and more robust. In Philipp Koehn, Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Tom Kocmi, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Marco Turchi, and Marcos Zampieri (eds.), Proceedings of the Seventh Conference on Machine Translation (WMT), pp.  46–68, Abu Dhabi, United Arab Emirates (Hybrid), December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.wmt-1.2.
  26. Results of WMT23 metrics shared task: Metrics might be guilty but references are not innocent. In Philipp Koehn, Barry Haddow, Tom Kocmi, and Christof Monz (eds.), Proceedings of the Eighth Conference on Machine Translation, pp.  578–628, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.wmt-1.51. URL https://aclanthology.org/2023.wmt-1.51.
  27. Convolutional sequence to sequence learning. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pp.  1243–1252. PMLR, 2017. URL http://proceedings.mlr.press/v70/gehring17a.html.
  28. Mask-predict: Parallel decoding of conditional masked language models. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  6112–6121, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1633. URL https://aclanthology.org/D19-1633.
  29. Non-autoregressive neural machine translation. CoRR, abs/1711.02281, 2017. URL http://arxiv.org/abs/1711.02281.
  30. Meta-learning for low-resource neural machine translation. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (eds.), Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.  3622–3631, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1398. URL https://aclanthology.org/D18-1398.
  31. Levenshtein transformer. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp.  11179–11189, 2019a. URL https://proceedings.neurips.cc/paper/2019/hash/675f9820626f5bc0afb47b57890b466e-Abstract.html.
  32. Levenshtein transformer. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp.  11179–11189, 2019b. URL https://proceedings.neurips.cc/paper/2019/hash/675f9820626f5bc0afb47b57890b466e-Abstract.html.
  33. xcomet: Transparent machine translation evaluation through fine-grained error detection. CoRR, abs/2310.10482, 2023. doi: 10.48550/ARXIV.2310.10482. URL https://doi.org/10.48550/arXiv.2310.10482.
  34. Large language model based multi-agents: A survey of progress and challenges. CoRR, abs/2402.01680, 2024. doi: 10.48550/ARXIV.2402.01680. URL https://doi.org/10.48550/arXiv.2402.01680.
  35. Survey of low-resource machine translation. Comput. Linguistics, 48(3):673–732, 2022. doi: 10.1162/COLI“˙A“˙00446. URL https://doi.org/10.1162/coli_a_00446.
  36. Contrastive preference learning: Learning from human feedback without RL. CoRR, abs/2310.13639, 2023. doi: 10.48550/ARXIV.2310.13639. URL https://doi.org/10.48550/arXiv.2310.13639.
  37. Measuring massive multitask language understanding. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=d7KBjmI3GmQ.
  38. Improving long context document-level machine translation. In Michael Strube, Chloe Braud, Christian Hardmeier, Junyi Jessy Li, Sharid Loaiciga, and Amir Zeldes (eds.), Proceedings of the 4th Workshop on Computational Approaches to Discourse (CODI 2023), pp.  112–125, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.codi-1.15. URL https://aclanthology.org/2023.codi-1.15.
  39. ORPO: monolithic preference optimization without reference model. CoRR, abs/2403.07691, 2024. doi: 10.48550/ARXIV.2403.07691. URL https://doi.org/10.48550/arXiv.2403.07691.
  40. Metagpt: Meta programming for multi-agent collaborative framework. CoRR, abs/2308.00352, 2023. doi: 10.48550/ARXIV.2308.00352. URL https://doi.org/10.48550/arXiv.2308.00352.
  41. Human feedback is not gold standard. CoRR, abs/2309.16349, 2023. doi: 10.48550/ARXIV.2309.16349. URL https://doi.org/10.48550/arXiv.2309.16349.
  42. Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9.
  43. Followbench: A multi-level fine-grained constraints following benchmark for large language models. CoRR, abs/2310.20410, 2023. doi: 10.48550/ARXIV.2310.20410. URL https://doi.org/10.48550/arXiv.2310.20410.
  44. MetricX-23: The Google submission to the WMT 2023 metrics shared task. In Philipp Koehn, Barry Haddow, Tom Kocmi, and Christof Monz (eds.), Proceedings of the Eighth Conference on Machine Translation, pp.  756–767, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.wmt-1.63. URL https://aclanthology.org/2023.wmt-1.63.
  45. Jeremy Klemin. The last frontier of machine translation. The Atlantic, 2024. URL https://www.theatlantic.com/technology/archive/2024/01/literary-translation-artificial-intelligence/677038/.
  46. GEMBA-MQM: Detecting translation quality error spans with GPT-4. In Philipp Koehn, Barry Haddow, Tom Kocmi, and Christof Monz (eds.), Proceedings of the Eighth Conference on Machine Translation, pp.  768–775, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.wmt-1.64. URL https://aclanthology.org/2023.wmt-1.64.
  47. Findings of the 2023 conference on machine translation (WMT23): LLMs are here but not quite there yet. In Philipp Koehn, Barry Haddow, Tom Kocmi, and Christof Monz (eds.), Proceedings of the Eighth Conference on Machine Translation, pp.  1–42, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.wmt-1.1. URL https://aclanthology.org/2023.wmt-1.1.
  48. Bactrian-x : A multilingual replicable instruction-following model with low-rank adaptation. CoRR, abs/2305.15011, 2023a. doi: 10.48550/ARXIV.2305.15011. URL https://doi.org/10.48550/arXiv.2305.15011.
  49. Large language model-empowered agents for simulating macroeconomic activities. CoRR, abs/2310.10436, 2023b. doi: 10.48550/ARXIV.2310.10436. URL https://doi.org/10.48550/arXiv.2310.10436.
  50. Universal conditional masked language pre-training for neural machine translation. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  6379–6391, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.442. URL https://aclanthology.org/2022.acl-long.442.
  51. Long-context llms struggle with long in-context learning. arXiv preprint arXiv:2404.02060, 2024.
  52. A survey on fairness in large language models. CoRR, abs/2308.10149, 2023c. doi: 10.48550/ARXIV.2308.10149. URL https://doi.org/10.48550/arXiv.2308.10149.
  53. Holistic evaluation of language models. CoRR, abs/2211.09110, 2022. doi: 10.48550/ARXIV.2211.09110. URL https://doi.org/10.48550/arXiv.2211.09110.
  54. Encouraging divergent thinking in large language models through multi-agent debate. CoRR, abs/2305.19118, 2023. doi: 10.48550/ARXIV.2305.19118. URL https://doi.org/10.48550/arXiv.2305.19118.
  55. Dora: Weight-decomposed low-rank adaptation. CoRR, abs/2402.09353, 2024. doi: 10.48550/ARXIV.2402.09353. URL https://doi.org/10.48550/arXiv.2402.09353.
  56. Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8:726–742, 2020. doi: 10.1162/tacl˙a˙00343. URL https://aclanthology.org/2020.tacl-1.47.
  57. The flan collection: Designing data and methods for effective instruction tuning. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp.  22631–22648. PMLR, 2023. URL https://proceedings.mlr.press/v202/longpre23a.html.
  58. Chain-of-dictionary prompting elicits translation in large language models. CoRR, abs/2305.06575, 2023. doi: 10.48550/ARXIV.2305.06575. URL https://doi.org/10.48550/arXiv.2305.06575.
  59. Wizardcoder: Empowering code large language models with evol-instruct. CoRR, abs/2306.08568, 2023. doi: 10.48550/ARXIV.2306.08568. URL https://doi.org/10.48550/arXiv.2306.08568.
  60. Macaw-llm: Multi-modal language modeling with image, audio, video, and text integration. CoRR, abs/2306.09093, 2023. doi: 10.48550/ARXIV.2306.09093. URL https://doi.org/10.48550/arXiv.2306.09093.
  61. Beyond probabilities: Unveiling the misalignment in evaluating large language models. CoRR, abs/2402.13887, 2024. doi: 10.48550/ARXIV.2402.13887. URL https://doi.org/10.48550/arXiv.2402.13887.
  62. Roco: Dialectic multi-robot collaboration with large language models. CoRR, abs/2307.04738, 2023. doi: 10.48550/ARXIV.2307.04738. URL https://doi.org/10.48550/arXiv.2307.04738.
  63. Mtld, vocd-d, and hd-d: A validation study of sophisticated approaches to lexical diversity assessment. Behavior research methods, 42(2):381–392, 2010.
  64. Welfare diplomacy: Benchmarking language model cooperation. CoRR, abs/2310.08901, 2023. doi: 10.48550/ARXIV.2310.08901. URL https://doi.org/10.48550/arXiv.2310.08901.
  65. OpenAI. GPT-4 technical report. CoRR, abs/2303.08774, 2023. doi: 10.48550/arXiv.2303.08774. URL https://doi.org/10.48550/arXiv.2303.08774.
  66. Training language models to follow instructions with human feedback. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html.
  67. Bleu: a method for automatic evaluation of machine translation. In Pierre Isabelle, Eugene Charniak, and Dekang Lin (eds.), Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp.  311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics. doi: 10.3115/1073083.1073135. URL https://aclanthology.org/P02-1040.
  68. Social simulacra: Creating populated prototypes for social computing systems. In Maneesh Agrawala, Jacob O. Wobbrock, Eytan Adar, and Vidya Setlur (eds.), The 35th Annual ACM Symposium on User Interface Software and Technology, UIST 2022, Bend, OR, USA, 29 October 2022 - 2 November 2022, pp.  74:1–74:18. ACM, 2022. doi: 10.1145/3526113.3545616. URL https://doi.org/10.1145/3526113.3545616.
  69. Generative agents: Interactive simulacra of human behavior. In Sean Follmer, Jeff Han, Jürgen Steimle, and Nathalie Henry Riche (eds.), Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST 2023, San Francisco, CA, USA, 29 October 2023- 1 November 2023, pp.  2:1–2:22. ACM, 2023. doi: 10.1145/3586183.3606763. URL https://doi.org/10.1145/3586183.3606763.
  70. Matt Post. A call for clarity in reporting BLEU scores. In Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, and Karin Verspoor (eds.), Proceedings of the Third Conference on Machine Translation: Research Papers, pp.  186–191, Brussels, Belgium, October 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-6319. URL https://aclanthology.org/W18-6319.
  71. Communicative agents for software development. CoRR, abs/2307.07924, 2023. doi: 10.48550/ARXIV.2307.07924. URL https://doi.org/10.48550/arXiv.2307.07924.
  72. Direct preference optimization: Your language model is secretly a reward model. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (eds.), Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023. URL http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html.
  73. COMET: A neural framework for MT evaluation. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  2685–2702, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.213. URL https://aclanthology.org/2020.emnlp-main.213.
  74. ChatGPT MT: Competitive for high- (but not low-) resource languages. In Philipp Koehn, Barry Haddow, Tom Kocmi, and Christof Monz (eds.), Proceedings of the Eighth Conference on Machine Translation, pp.  392–418, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.wmt-1.40. URL https://aclanthology.org/2023.wmt-1.40.
  75. Multitask prompted training enables zero-shot task generalization. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=9Vrb9D0WI4.
  76. BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100, 2022. doi: 10.48550/ARXIV.2211.05100. URL https://doi.org/10.48550/arXiv.2211.05100.
  77. BLEURT: Learning robust metrics for text generation. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  7881–7892, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.704. URL https://aclanthology.org/2020.acl-main.704.
  78. Flan-moe: Scaling instruction-finetuned language models with sparse mixture of experts. CoRR, abs/2305.14705, 2023. doi: 10.48550/ARXIV.2305.14705. URL https://doi.org/10.48550/arXiv.2305.14705.
  79. Mixture models for diverse machine translation: Tricks of the trade. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pp.  5719–5728. PMLR, 2019. URL http://proceedings.mlr.press/v97/shen19c.html.
  80. Reflexion: language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=vAElhFcKW6.
  81. Counting-stars: A simple, efficient, and reasonable strategy for evaluating long-context large language models. CoRR, abs/2403.11802, 2024. doi: 10.48550/ARXIV.2403.11802. URL https://doi.org/10.48550/arXiv.2403.11802.
  82. Rethinking document-level neural machine translation. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.), Findings of the Association for Computational Linguistics: ACL 2022, pp.  3537–3548, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-acl.279. URL https://aclanthology.org/2022.findings-acl.279.
  83. Sequence to sequence learning with neural networks. In Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger (eds.), Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp.  3104–3112, 2014. URL https://proceedings.neurips.cc/paper/2014/hash/a14ac55a4f27472c5d894ec1c3c743d2-Abstract.html.
  84. UL2: unifying language learning paradigms. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=6ruVLB727MC.
  85. Exploring document-level literary machine translation with parallel paragraphs from world literature. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  9882–9902, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.672. URL https://aclanthology.org/2022.emnlp-main.672.
  86. Llama: Open and efficient foundation language models. CoRR, abs/2302.13971, 2023a. doi: 10.48550/ARXIV.2302.13971. URL https://doi.org/10.48550/arXiv.2302.13971.
  87. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023b. doi: 10.48550/ARXIV.2307.09288. URL https://doi.org/10.48550/arXiv.2307.09288.
  88. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp.  5998–6008, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
  89. Towards a literary machine translation: The role of referential cohesion. In David Elson, Anna Kazantseva, Rada Mihalcea, and Stan Szpakowicz (eds.), Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature, pp.  18–25, Montréal, Canada, June 2012. Association for Computational Linguistics. URL https://aclanthology.org/W12-2503.
  90. Exploiting cross-sentence context for neural machine translation. In Martha Palmer, Rebecca Hwa, and Sebastian Riedel (eds.), Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp.  2826–2831, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/D17-1301. URL https://aclanthology.org/D17-1301.
  91. Document-level machine translation with large language models. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  16646–16661, Singapore, December 2023a. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.1036. URL https://aclanthology.org/2023.emnlp-main.1036.
  92. Findings of the WMT 2023 shared task on discourse-level literary translation: A fresh orb in the cosmos of LLMs. In Philipp Koehn, Barry Haddow, Tom Kocmi, and Christof Monz (eds.), Proceedings of the Eighth Conference on Machine Translation, pp.  55–67, Singapore, December 2023b. Association for Computational Linguistics. doi: 10.18653/v1/2023.wmt-1.3. URL https://aclanthology.org/2023.wmt-1.3.
  93. Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  5085–5109, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.340. URL https://aclanthology.org/2022.emnlp-main.340.
  94. Self-instruct: Aligning language models with self-generated instructions. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  13484–13508, Toronto, Canada, July 2023c. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.754. URL https://aclanthology.org/2023.acl-long.754.
  95. Gpt4video: A unified multimodal large language model for lnstruction-followed understanding and safety-aware generation. CoRR, abs/2311.16511, 2023d. doi: 10.48550/ARXIV.2311.16511. URL https://doi.org/10.48550/arXiv.2311.16511.
  96. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. CoRR, abs/2302.01560, 2023e. doi: 10.48550/ARXIV.2302.01560. URL https://doi.org/10.48550/arXiv.2302.01560.
  97. Finetuned language models are zero-shot learners. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=gEZrGCozdqR.
  98. Intelligent agents: theory and practice. Knowl. Eng. Rev., 10(2):115–152, 1995. doi: 10.1017/S0269888900008122. URL https://doi.org/10.1017/S0269888900008122.
  99. Style over substance: Evaluation biases for large language models. CoRR, abs/2307.03025, 2023. doi: 10.48550/ARXIV.2307.03025. URL https://doi.org/10.48550/arXiv.2307.03025.
  100. Uncertainty-aware balancing for multilingual and multi-domain neural machine translation training. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  7291–7305, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.580. URL https://aclanthology.org/2021.emnlp-main.580.
  101. Document flattening: Beyond concatenating context for document-level neural machine translation. In Andreas Vlachos and Isabelle Augenstein (eds.), Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp.  448–462, Dubrovnik, Croatia, May 2023a. Association for Computational Linguistics. doi: 10.18653/v1/2023.eacl-main.33. URL https://aclanthology.org/2023.eacl-main.33.
  102. Lamini-lm: A diverse herd of distilled models from large-scale instructions. CoRR, abs/2304.14402, 2023b. doi: 10.48550/ARXIV.2304.14402. URL https://doi.org/10.48550/arXiv.2304.14402.
  103. Adapting large language models for document-level machine translation. CoRR, abs/2401.06468, 2024a. doi: 10.48550/ARXIV.2401.06468. URL https://doi.org/10.48550/arXiv.2401.06468.
  104. Importance-aware data augmentation for document-level neural machine translation. In Yvette Graham and Matthew Purver (eds.), Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  740–752, St. Julian’s, Malta, March 2024b. Association for Computational Linguistics. URL https://aclanthology.org/2024.eacl-long.44.
  105. HW-TSC’s submissions to the WMT23 discourse-level literary translation shared task. In Philipp Koehn, Barry Haddow, Tom Kocmi, and Christof Monz (eds.), Proceedings of the Eighth Conference on Machine Translation, pp.  302–306, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.wmt-1.32. URL https://aclanthology.org/2023.wmt-1.32.
  106. A paradigm shift in machine translation: Boosting translation performance of large language models. CoRR, abs/2309.11674, 2023a. doi: 10.48550/ARXIV.2309.11674. URL https://doi.org/10.48550/arXiv.2309.11674.
  107. Contrastive preference optimization: Pushing the boundaries of LLM performance in machine translation. CoRR, abs/2401.08417, 2024. doi: 10.48550/ARXIV.2401.08417. URL https://doi.org/10.48550/arXiv.2401.08417.
  108. Exploring large language models for communication games: An empirical study on werewolf. CoRR, abs/2309.04658, 2023b. doi: 10.48550/ARXIV.2309.04658. URL https://doi.org/10.48550/arXiv.2309.04658.
  109. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=WE_vluYUL-X.
  110. Mammoth: Building math generalist models through hybrid instruction tuning. CoRR, abs/2309.05653, 2023. doi: 10.48550/ARXIV.2309.05653. URL https://doi.org/10.48550/arXiv.2309.05653.
  111. Building cooperative embodied agents modularly with large language models. CoRR, abs/2307.02485, 2023a. doi: 10.48550/ARXIV.2307.02485. URL https://doi.org/10.48550/arXiv.2307.02485.
  112. Bayling: Bridging cross-lingual alignment and instruction following through interactive translation for large language models. CoRR, abs/2306.10968, 2023b. doi: 10.48550/ARXIV.2306.10968. URL https://doi.org/10.48550/arXiv.2306.10968.
  113. Siren’s song in the AI ocean: A survey on hallucination in large language models. CoRR, abs/2309.01219, 2023c. doi: 10.48550/ARXIV.2309.01219. URL https://doi.org/10.48550/arXiv.2309.01219.
  114. DUTNLP system for the WMT2023 discourse-level literary translation. In Philipp Koehn, Barry Haddow, Tom Kocmi, and Christof Monz (eds.), Proceedings of the Eighth Conference on Machine Translation, pp.  296–301, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.wmt-1.31. URL https://aclanthology.org/2023.wmt-1.31.
  115. Judging llm-as-a-judge with mt-bench and chatbot arena. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (eds.), Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023a. URL http://papers.nips.cc/paper_files/paper/2023/hash/91f18a1287b398d378ef22505bf41832-Abstract-Datasets_and_Benchmarks.html.
  116. Judging llm-as-a-judge with mt-bench and chatbot arena. CoRR, abs/2306.05685, 2023b. doi: 10.48550/arXiv.2306.05685. URL https://doi.org/10.48550/arXiv.2306.05685.
  117. Transfer learning for low-resource neural machine translation. In Jian Su, Kevin Duh, and Xavier Carreras (eds.), Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.  1568–1575, Austin, Texas, November 2016. Association for Computational Linguistics. doi: 10.18653/v1/D16-1163. URL https://aclanthology.org/D16-1163.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Minghao Wu (31 papers)
  2. Yulin Yuan (6 papers)
  3. Gholamreza Haffari (141 papers)
  4. Longyue Wang (87 papers)
Citations (13)
Youtube Logo Streamline Icon: https://streamlinehq.com