Papers
Topics
Authors
Recent
2000 character limit reached

Anti-LM Decoding for Zero-shot In-context Machine Translation (2311.08324v2)

Published 14 Nov 2023 in cs.CL and cs.AI

Abstract: Zero-shot In-context learning is the phenomenon where models can perform the task simply given the instructions. However, pre-trained LLMs are known to be poorly calibrated for this task. One of the most effective approaches to handling this bias is to adopt a contrastive decoding objective, which accounts for the prior probability of generating the next token by conditioning on some context. This work introduces an Anti-LLM objective with a decay factor designed to address the weaknesses of In-context Machine Translation. We conduct our experiments across 3 model types and sizes, 3 language directions, and for both greedy decoding and beam search ($B=5$). The proposed method outperforms other state-of-art decoding objectives, with up to $20$ BLEU point improvement from the default objective observed in some settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Robust MT evaluation with sentence-level multilingual augmentation. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 469–478, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  3. On the off-target problem of zero-shot multilingual neural machine translation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9542–9558, Toronto, Canada. Association for Computational Linguistics.
  4. Dola: Decoding by contrasting layers improves factuality in large language models. In The Twelfth International Conference on Learning Representations.
  5. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
  6. Decoding methods for neural narrative generation. In Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021), pages 166–185, Online. Association for Computational Linguistics.
  7. Hierarchical neural story generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 889–898, Melbourne, Australia. Association for Computational Linguistics.
  8. Markus Freitag and Yaser Al-Onaizan. 2017. Beam search strategies for neural machine translation. In Proceedings of the First Workshop on Neural Machine Translation, pages 56–60, Vancouver. Association for Computational Linguistics.
  9. Results of WMT22 metrics shared task: Stop using BLEU – neural metrics are better and more robust. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 46–68, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  10. The Flores-101 evaluation benchmark for low-resource and multilingual machine translation. Transactions of the Association for Computational Linguistics, 10:522–538.
  11. Improved zero-shot neural machine translation via ignoring spurious correlations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1258–1268, Florence, Italy. Association for Computational Linguistics.
  12. The curious case of neural text degeneration. In International Conference on Learning Representations.
  13. Surface form competition: Why the highest probability answer isn’t always right. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7038–7051, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  14. spacy: Industrial-strength natural language processing in python.
  15. Philipp Koehn. 2004. Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In Machine Translation: From Real Users to Research: 6th Conference of the Association for Machine Translation in the Americas, AMTA 2004, Washington, DC, USA, September 28-October 2, 2004. Proceedings 6, pages 115–124. Springer.
  16. Sawan Kumar. 2022. Answer-level calibration for free-form multiple choice question answering. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 665–679.
  17. The bigscience roots corpus: A 1.6 tb composite multilingual dataset. Advances in Neural Information Processing Systems, 35:31809–31826.
  18. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 110–119, San Diego, California. Association for Computational Linguistics.
  19. Contrastive decoding: Open-ended text generation as optimization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12286–12312, Toronto, Canada. Association for Computational Linguistics.
  20. Few-shot learning with multilingual generative language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9019–9052, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  21. DExperts: Decoding-time controlled text generation with experts and anti-experts. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6691–6706, Online. Association for Computational Linguistics.
  22. Large language models as general pattern machines. In 7th Annual Conference on Robot Learning.
  23. Pointwise mutual information based metric and decoding strategy for faithful generation in document grounded dialogs. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10335–10347, Singapore. Association for Computational Linguistics.
  24. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  25. Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191, Brussels, Belgium. Association for Computational Linguistics.
  26. COMET-22: Unbabel-IST 2022 submission for the metrics shared task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 578–585, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  27. Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA ’21, New York, NY, USA. Association for Computing Machinery.
  28. Decoding and diversity in machine translation. arXiv preprint arXiv:2011.13477.
  29. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  30. Mitigating hallucinations and off-target machine translation with source-contrastive and language-contrastive decoding. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 21–33, St. Julian’s, Malta. Association for Computational Linguistics.
  31. Trusting your evidence: Hallucinate less with context-aware decoding. arXiv preprint arXiv:2305.14739.
  32. Shaomu Tan and Christof Monz. 2023. Towards a better understanding of variations in zero-shot neural machine translation performance. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13553–13568, Singapore. Association for Computational Linguistics.
  33. Llama: Open and efficient foundation language models.
  34. Llama 2: Open foundation and fine-tuned chat models.
  35. Mutual information alleviates hallucinations in abstractive summarization. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5956–5965, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  36. Reducing negative effects of the biases of language models in zero-shot setting. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pages 904–912.
  37. Ebbs: An ensemble with bi-level beam search for zero-shot machine translation.
  38. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  39. An explanation of in-context learning as implicit bayesian inference. In International Conference on Learning Representations.
  40. A frustratingly simple decoding method for neural text generation. arXiv preprint arXiv:2305.12675.
  41. Decoding methods in neural language generation: A survey. Information, 12(9).
  42. Opt: Open pre-trained transformer language models.
  43. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pages 12697–12706. PMLR.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 13 likes about this paper.