Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the True Distribution Approximation of Minimum Bayes-Risk Decoding (2404.00752v1)

Published 31 Mar 2024 in cs.CL and cs.AI

Abstract: Minimum Bayes-risk (MBR) decoding has recently gained renewed attention in text generation. MBR decoding considers texts sampled from a model as pseudo-references and selects the text with the highest similarity to the others. Therefore, sampling is one of the key elements of MBR decoding, and previous studies reported that the performance varies by sampling methods. From a theoretical standpoint, this performance variation is likely tied to how closely the samples approximate the true distribution of references. However, this approximation has not been the subject of in-depth study. In this study, we propose using anomaly detection to measure the degree of approximation. We first closely examine the performance variation and then show that previous hypotheses about samples do not correlate well with the variation, but our introduced anomaly scores do. The results are the first to empirically support the link between the performance and the core assumption of MBR decoding.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Chantal Amrhein and Rico Sennrich. 2022. Identifying weaknesses in machine translation metrics through minimum Bayes risk decoding: A case study for COMET. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1125–1141, Online only. Association for Computational Linguistics.
  2. Fabrizio Angiulli and Clara Pizzuti. 2002. Fast outlier detection in high dimensional spaces. In Principles of Data Mining and Knowledge Discovery, pages 15–27, Berlin, Heidelberg. Springer Berlin Heidelberg.
  3. Findings of the 2019 conference on machine translation (WMT19). In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 1–61, Florence, Italy. Association for Computational Linguistics.
  4. It’s MBR all the way down: Modern generation techniques through the lens of minimum Bayes risk. arXiv preprint arXiv:2310.01387v1.
  5. Exploring hypotheses spaces in neural machine translation. In Proceedings of Machine Translation Summit XVI: Research Track, pages 282–298, Nagoya Japan.
  6. Sebastian Borgeaud and Guy Emerson. 2020. Leveraging sentence similarity in natural language generation: Improving beam search using range voting. In Proceedings of the Fourth Workshop on Neural Generation and Translation, pages 97–109, Online. Association for Computational Linguistics.
  7. Lof: Identifying density-based local outliers. SIGMOD Rec., 29(2):93–104.
  8. Julius Cheng and Andreas Vlachos. 2023. Faster minimum Bayes risk decoding with confidence-based pruning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12473–12480, Singapore. Association for Computational Linguistics.
  9. Bryan Eikema and Wilker Aziz. 2020. Is MAP decoding all you need? the inadequacy of the mode in neural machine translation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4506–4520, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  10. Bryan Eikema and Wilker Aziz. 2022. Sampling-based approximations to minimum Bayes risk decoding for neural machine translation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10978–10993, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  11. Quality-aware decoding for neural machine translation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1396–1412, Seattle, United States. Association for Computational Linguistics.
  12. Mara Finkelstein and Markus Freitag. 2023. MBR and QE finetuning: Training-time distillation of the best and most expensive decoding methods. arXiv preprint arXiv:2309.10966v1.
  13. Epsilon sampling rocks: Investigating sampling strategies for minimum Bayes risk decoding for machine translation. arXiv preprint arXiv:2305.09860v2.
  14. High quality rather than high model probability: Minimum Bayes risk decoding with neural metrics. Transactions of the Association for Computational Linguistics, 10:811–825.
  15. Vaibhava Goel and William J Byrne. 2000. Minimum Bayes-risk automatic speech recognition. Computer Speech & Language, 14(2):115–135.
  16. Statistical analysis of nearest neighbor methods for anomaly detection. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
  17. Truncation sampling as language model desmoothing. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3414–3427, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  18. The curious case of neural text degeneration. In International Conference on Learning Representations.
  19. Yuu Jinnai and Kaito Ariu. 2024. Hyperparameter-free approach for faster minimum bayes risk decoding. arXiv preprint arXiv:2401.02749v1.
  20. Generating diverse and high-quality texts by minimum bayes risk decoding. arXiv preprint arXiv:2401.05054v1.
  21. Model-based minimum bayes risk decoding. arXiv preprint arXiv:2311.05263v1.
  22. Interpreting and unifying outlier scores. In Proceedings of the 2011 SIAM International Conference on Data Mining (SDM), pages 13–24. SIAM.
  23. Shankar Kumar and William Byrne. 2002. Minimum Bayes-risk word alignments of bilingual texts. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pages 140–147. Association for Computational Linguistics.
  24. Shankar Kumar and William Byrne. 2004. Minimum Bayes-risk decoding for statistical machine translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, pages 169–176, Boston, Massachusetts, USA. Association for Computational Linguistics.
  25. Prasanta Chandra Mahalanobis. 1936. On the generalized distance in statistics. In Proceedings of the National Institute of Science of India, volume 12, pages 49–55. National Institute of Science of India.
  26. Mathias Müller and Rico Sennrich. 2021. Understanding the properties of minimum Bayes risk decoding in neural machine translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 259–272, Online. Association for Computational Linguistics.
  27. Facebook FAIR’s WMT19 news translation task submission. In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 314–319, Florence, Italy. Association for Computational Linguistics.
  28. COMET-22: Unbabel-IST 2022 submission for the metrics shared task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 578–585, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  29. COMET: A neural framework for MT evaluation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2685–2702, Online. Association for Computational Linguistics.
  30. A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE, 109(5):756–795.
  31. Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data mining and knowledge discovery, 28:190–237.
  32. Raphael Shu and Hideki Nakayama. 2017. Later-stage minimum Bayes-risk decoding for neural machine translation. arXiv preprint arXiv:1704.03169v2.
  33. Neural machine translation by minimising the Bayes-risk with respect to syntactic translation lattices. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 362–368, Valencia, Spain. Association for Computational Linguistics.
  34. Follow the wisdom of the crowd: Effective text generation via minimum Bayes risk decoding. In Findings of the Association for Computational Linguistics: ACL 2023, pages 4265–4293, Toronto, Canada. Association for Computational Linguistics.
  35. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  36. Direct preference optimization for neural machine translation with minimum bayes risk decoding. arXiv preprint arXiv:2311.08380v1.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Atsumoto Ohashi (8 papers)
  2. Ukyo Honda (13 papers)
  3. Tetsuro Morimura (18 papers)
  4. Yuu Jinnai (21 papers)
Citations (2)
Youtube Logo Streamline Icon: https://streamlinehq.com