Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prompting PaLM for Translation: Assessing Strategies and Performance (2211.09102v3)

Published 16 Nov 2022 in cs.CL

Abstract: LLMs that have been trained on multilingual but not parallel text exhibit a remarkable ability to translate between languages. We probe this ability in an in-depth study of the pathways LLM (PaLM), which has demonstrated the strongest machine translation (MT) performance among similarly-trained LLMs to date. We investigate various strategies for choosing translation examples for few-shot prompting, concluding that example quality is the most important factor. Using optimized prompts, we revisit previous assessments of PaLM's MT capabilities with more recent test sets, modern MT metrics, and human evaluation, and find that its performance, while impressive, still lags that of state-of-the-art supervised systems. We conclude by providing an analysis of PaLM's MT output which reveals some interesting properties and prospects for future work.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. In-context examples selection for machine translation. arXiv preprint arXiv:2212.02437.
  2. Findings of the 2021 conference on machine translation (WMT21). In Proceedings of the Sixth Conference on Machine Translation, pages 1–88, Online. Association for Computational Linguistics.
  3. Anonymous. 2023. Does gpt-3 produces less literal translations? Anonymous preprint under review.
  4. Promptsource: An integrated development environment and repository for natural language prompts. arXiv preprint arXiv:2202.01279.
  5. Rachel Bawden and François Yvon. 2023. Investigating the translation performance of a large multilingual language model: the case of bloom. arXiv preprint arXiv:2303.01911.
  6. Searching for needles in a haystack: On the role of incidental bilingualism in palm’s translation capability. arXiv preprint arXiv:2305.10266.
  7. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901.
  8. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  9. Rethinking embedding coupling in pre-trained language models. arXiv preprint:2010.12821.
  10. Toxicity in multilingual machine translation at scale. arXiv preprint arXiv:2210.03070.
  11. A statistical analysis of summarization evaluation metrics using resampling methods. Transactions of the Association for Computational Linguistics, 9:1132–1146.
  12. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  13. APE at scale and its implications on MT evaluation biases. In Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers), pages 34–44, Florence, Italy. Association for Computational Linguistics.
  14. Experts, errors, and context: A large-scale study of human evaluation for machine translation. Transactions of the Association for Computational Linguistics, 9:1460–1474.
  15. Results of the WMT21 metrics shared task: Evaluating metrics with expert-based human evaluations on TED and news domain. In Proceedings of the Sixth Conference on Machine Translation, pages 733–774, Online. Association for Computational Linguistics.
  16. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3816–3830.
  17. The unreasonable effectiveness of few-shot learning for machine translation. arXiv preprint arXiv:2302.01398.
  18. Xavier Garcia and Orhan Firat. 2022. Using natural language prompts for machine translation. arXiv preprint arXiv:2202.11822.
  19. What can transformers learn in-context? a case study of simple function classes. arXiv preprint arXiv:2208.01066.
  20. RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online. Association for Computational Linguistics.
  21. Dictionary-based phrase-level prompting of large language models for machine translation. arXiv preprint arXiv:2302.07856.
  22. Hallucinations in large multilingual translation models. arXiv preprint arXiv:2303.16104.
  23. Accelerating large-scale inference with anisotropic vector quantization. In International Conference on Machine Learning.
  24. Whose language counts as high quality? measuring language ideologies in text data selection. arXiv preprint arXiv:2201.10474.
  25. WARP: Word-level Adversarial ReProgramming. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4921–4933, Online. Association for Computational Linguistics.
  26. Exploring human-like translation strategy with large language models. arXiv preprint arXiv:2305.04118.
  27. How good are gpt models at machine translation? a comprehensive evaluation. arXiv preprint arXiv:2302.09210.
  28. Metaprompting: Learning to learn better prompts. arXiv preprint arXiv:2209.11486.
  29. Is chatgpt a good translator? a preliminary study. arXiv preprint arXiv:2301.08745.
  30. Bilex rx: Lexical data augmentation for massively multilingual machine translation. arXiv preprint arXiv:2303.15265.
  31. Marzena Karpinska and Mohit Iyyer. 2023. Large language models effectively leverage document-level context for literary translation, but critical errors persist. arXiv preprint arXiv:2304.03245.
  32. Nearest neighbor machine translation. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
  33. To ship or not to ship: An extensive evaluation of automatic metrics for machine translation. In Proceedings of the Sixth Conference on Machine Translation, pages 478–494, Online. Association for Computational Linguistics.
  34. Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 388–395, Barcelona, Spain. Association for Computational Linguistics.
  35. Sawan Kumar and Partha Talukdar. 2021. Reordering examples helps during priming-based few-shot learning. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4507–4518.
  36. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  37. Probing via prompting. arXiv preprint arXiv:2207.01736.
  38. Learning to transfer prompts for text generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3506–3518, Seattle, United States. Association for Computational Linguistics.
  39. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online. Association for Computational Linguistics.
  40. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. arXiv preprint arXiv:2205.05638.
  41. What makes good in-context examples for GPT-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 100–114, Dublin, Ireland and Online. Association for Computational Linguistics.
  42. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586.
  43. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  44. Multidimensional Quality Metrics (MQM) : A Framework for Declaring and Describing Translation Quality Metrics. Tradumàtica, pages 0455–463.
  45. Chain-of-dictionary prompting elicits translation in large language models. arXiv preprint arXiv:2305.06575.
  46. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086–8098, Dublin, Ireland. Association for Computational Linguistics.
  47. Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837.
  48. Adaptive machine translation with large language models. arXiv preprint arXiv:2301.13294.
  49. Bidirectional language models are also few-shot learners. arXiv preprint arXiv:2209.14500.
  50. Interactive-chain-prompting: Ambiguity resolution for crosslingual conditional generation with interaction. arXiv preprint arXiv:2301.10309.
  51. Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191, Brussels, Belgium. Association for Computational Linguistics.
  52. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  53. Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–7.
  54. Timo Schick and Hinrich Schütze. 2021. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 255–269.
  55. Cross-lingual supervision improves large language models pre-training. arXiv preprint arXiv:2305.11778.
  56. BLEURT: Learning robust metrics for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7881–7892, Online. Association for Computational Linguistics.
  57. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4222–4235.
  58. Explaining patterns in data with language models via interpretable autoprompting. arXiv preprint arXiv:2210.01848.
  59. Interactive and visual prompt engineering for ad-hoc task adaptation with large language models. IEEE transactions on visualization and computer graphics.
  60. Facebook AI’s WMT21 news translation task submission. In Proceedings of the Sixth Conference on Machine Translation, pages 205–215, Online. Association for Computational Linguistics.
  61. Prompting for a conversation: How to control a dialog model? arXiv preprint arXiv:2209.11068.
  62. Tencent translation system for the WMT21 news translation task. In Proceedings of the Sixth Conference on Machine Translation, pages 216–224, Online. Association for Computational Linguistics.
  63. Document-level machine translation with large language models. arXiv preprint arXiv:2304.02210.
  64. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
  65. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
  66. WeChat neural machine translation systems for WMT21. In Proceedings of the Sixth Conference on Machine Translation, pages 243–254, Online. Association for Computational Linguistics.
  67. Prompting large language model for machine translation: A case study. arXiv preprint arXiv:2301.07069.
  68. Multilingual machine translation with large language models: Empirical results and analysis. arXiv preprint arXiv:2304.04675.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. David Vilar (12 papers)
  2. Markus Freitag (49 papers)
  3. Colin Cherry (38 papers)
  4. Jiaming Luo (21 papers)
  5. Viresh Ratnakar (4 papers)
  6. George Foster (24 papers)
Citations (138)