Papers
Topics
Authors
Recent
2000 character limit reached

EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation (2403.00144v2)

Published 29 Feb 2024 in cs.CL, cs.AI, and cs.LG

Abstract: The ability of zero-shot translation emerges when we train a multilingual model with certain translation directions; the model can then directly translate in unseen directions. Alternatively, zero-shot translation can be accomplished by pivoting through a third language (e.g., English). In our work, we observe that both direct and pivot translations are noisy and achieve less satisfactory performance. We propose EBBS, an ensemble method with a novel bi-level beam search algorithm, where each ensemble component explores its own prediction step by step at the lower level but they are synchronized by a "soft voting" mechanism at the upper level. Results on two popular multilingual translation datasets show that EBBS consistently outperforms direct and pivot translations as well as existing ensemble techniques. Further, we can distill the ensemble's knowledge back to the multilingual model to improve inference efficiency; profoundly, our EBBS-based distillation does not sacrifice, or even improves, the translation quality.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Translating from under-resourced languages: comparing direct transfer against pivot translation. In Proceedings of Machine Translation Summit, 2007. URL https://aclanthology.org/2007.mtsummit-papers.5.
  2. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations, 2015. URL https://arxiv.org/abs/1409.0473.
  3. Multilingual machine translation with hyper-adapters. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1170–1185, 2022. URL https://aclanthology.org/2022.emnlp-main.77.
  4. Mathematical Statistics: Basic Ideas and Selected Topics. CRC Press, 2015. URL https://bickel.stat.berkeley.edu/teaching.
  5. Leo Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996. URL https://link.springer.com/article/10.1007/BF00058655.
  6. A statistical approach to machine translation. Computational Linguistics, 16(2):79–85, 1990. URL https://aclanthology.org/J90-2002.
  7. Analyzing bagging. The Annals of Statistics, 30(4):927–961, 2002. URL https://www.jstor.org/stable/1558692?seq=1&cid=pdf-reference.
  8. Overview of the IWSLT 2017 evaluation campaign. In Sakriani Sakti and Masao Utiyama, editors, Proceedings of the International Conference on Spoken Language Translation, page 2–14, 2017. URL https://aclanthology.org/2017.iwslt-1.1.
  9. Language-family adapters for low-resource multilingual neural machine translation. In Atul Kr. Ojha, Chao-hong Liu, Ekaterina Vylomova, Flammie Pirinen, Jade Abbott, Jonathan Washington, Nathaniel Oco, Valentin Malykh, Varvara Logacheva, and Xiaobing Zhao, editors, Proceedings of the the Workshop on Technologies for Machine Translation of Low-Resource Languages, pages 59–72, 2023. URL https://aclanthology.org/2023.loresmt-1.5.
  10. MMCR4NLP: Multilingual multiway corpora repository for natural language processing. arXiv preprint arXiv:1710.01025, 2017. URL http://arxiv.org/abs/1710.01025.
  11. A survey of multilingual neural machine translation. ACM Computing Surveys, 53(5), 2020. URL https://doi.org/10.1145/3406095.
  12. A survey on ensemble learning. Frontiers of Computer Science, 14:241–258, 2020. URL https://link.springer.com/article/10.1007/s11704-019-8208-z.
  13. Is MAP decoding all you need? The inadequacy of the mode in neural machine translation. In Proceedings of the International Conference on Computational Linguistics, pages 4506–4520, 2020. URL https://aclanthology.org/2020.coling-main.398.
  14. Beyond English-centric multilingual machine translation. Journal of Machine Learning Research, 22(107):1–48, 2021. URL http://jmlr.org/papers/v22/20-1307.html.
  15. Ensemble distillation for neural machine translation. arXiv preprint arXiv:1702.01802, 2017. URL http://arxiv.org/abs/1702.01802.
  16. Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence, 115:105151, 2022. URL https://www.sciencedirect.com/science/article/pii/S095219762200269X.
  17. Improved zero-shot neural machine translation via ignoring spurious correlations. In Anna Korhonen, David Traum, and Lluís Màrquez, editors, Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 1258–1268, 2019. URL https://aclanthology.org/P19-1121.
  18. Hado Hasselt. Double q-learning. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems, 2010. URL https://proceedings.neurips.cc/paper_files/paper/2010/file/091d584fced301b442654dd8c23b3fc9-Paper.pdf.
  19. Multi-class AdaBoost. Statistics and its Interface, 2(3):349–360, 2009. URL https://www.intlpress.com/site/pub/pages/journals/items/sii/content/vols/0002/0003/a008/.
  20. Gradient-based gradual pruning for language-specific multilingual neural machine translation. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 654–670, 2023. URL https://aclanthology.org/2023.emnlp-main.43.
  21. Ensembling of distilled models from multi-task teachers for constrained resource language pairs. In Proceedings of the Conference on Machine Translation, pages 130–135, 2021. URL https://aclanthology.org/2021.wmt-1.8.
  22. The curious case of neural text degeneration. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rygGQyrFvH.
  23. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5:339–351, 2017. URL https://aclanthology.org/Q17-1024.
  24. Sequence-level knowledge distillation. In Jian Su, Kevin Duh, and Xavier Carreras, editors, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1317–1327, 2016. URL https://aclanthology.org/D16-1139.
  25. Hayato Kobayashi. Frustratingly easy model ensemble for abstractive summarization. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, editors, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 4165–4176, 2018. URL https://aclanthology.org/D18-1449.
  26. Philipp Koehn. Europarl: A parallel corpus for statistical machine translation. In Proceedings of Machine Translation Summit, pages 79–86, 2005. URL https://aclanthology.org/2005.mtsummit-papers.11.
  27. Unsupervised machine translation using monolingual corpora only. In International Conference on Learning Representations, 2018a. URL https://openreview.net/forum?id=rkYTTf-AZ.
  28. Phrase-based & neural unsupervised machine translation. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, editors, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 5039–5049, 2018b. URL https://aclanthology.org/D18-1549.
  29. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, 2020. URL https://aclanthology.org/2020.acl-main.703.
  30. Improving zero-shot translation by disentangling positional information. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli, editors, Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, pages 1259–1273, 2021. URL https://aclanthology.org/2021.acl-long.101.
  31. If beam search is the answer, what was the question? In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 2173–2185, 2020. URL https://aclanthology.org/2020.emnlp-main.170.
  32. Unsupervised word translation with adversarial autoencoder. Computational Linguistics, 46(2):257–288, 2020. URL https://aclanthology.org/2020.cl-2.2.
  33. Understanding the properties of minimum Bayes risk decoding in neural machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, pages 259–272, 2021. URL https://aclanthology.org/2021.acl-long.22.
  34. Gradient boosting machines, a tutorial. Frontiers in Neurorobotics, 7:21, 2013. URL https://www.frontiersin.org/articles/10.3389/fnbot.2013.00021/full.
  35. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, pages 48–53, 2019. URL https://aclanthology.org/N19-4009.
  36. BLEU: A method for automatic evaluation of machine translation. In Pierre Isabelle, Eugene Charniak, and Dekang Lin, editors, Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 311–318, 2002. URL https://aclanthology.org/P02-1040.
  37. Matt Post. A call for clarity in reporting BLEU scores. In Proceedings of the Conference on Machine Translation, pages 186–191, 2018. URL https://aclanthology.org/W18-6319.
  38. Language models are unsupervised multitask learners. OpenAI Blog, 2019. URL https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.
  39. Exploring the limits of transfer learning with a unified text-to-text Transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. URL https://jmlr.org/papers/v21/20-074.html.
  40. Neural machine translation for low-resource languages: A survey. ACM Computing Survey, 55(11), 2023. URL https://doi.org/10.1145/3567592.
  41. BLOOM: A 176B-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022. URL https://arxiv.org/abs/2211.05100.
  42. Robert E Schapire. The boosting approach to machine learning: An overview. Nonlinear Estimation and Classification, pages 149–171, 2003. URL https://link.springer.com/chapter/10.1007/978-0-387-21579-2_9.
  43. Edinburgh neural machine translation systems for WMT 16. In Proceedings of the Conference on Machine Translation, pages 371–376, 2016a. URL https://aclanthology.org/W16-2323.
  44. Neural machine translation of rare words with subword units. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 1715–1725, 2016b. URL https://aclanthology.org/P16-1162.
  45. Improving low resource speech translation with data augmentation and ensemble strategies. In Proceedings of the International Conference on Spoken Language Translation, pages 241–250, 2023. URL https://aclanthology.org/2023.iwslt-1.21.
  46. Ensemble distillation for unsupervised constituency parsing. In International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=RR8y0WKrFv.
  47. NMTScore: A multilingual analysis of translation-based text similarity measures. In Findings of the Association for Computational Linguistics: EMNLP, pages 198–213, 2022. URL https://aclanthology.org/2022.findings-emnlp.15.
  48. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2094–2100, 2016. URL https://ojs.aaai.org/index.php/AAAI/article/view/10295.
  49. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  50. Parameter differentiation based multilingual neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11440–11448, 2022. URL https://ojs.aaai.org/index.php/AAAI/article/view/21396.
  51. On negative interference in multilingual models: Findings and a meta-learning treatment. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 4438–4450, 2020. URL https://aclanthology.org/2020.emnlp-main.359.
  52. Why do neural dialog systems generate short and meaningless replies? A comparison between dialog and translation. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pages 7290–7294, 2019. URL https://ieeexplore.ieee.org/document/8682634.
  53. An equal-size hard EM algorithm for diverse dialogue generation. In International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=k5PEHHY4spM.
  54. f𝑓fitalic_f-divergence minimization for sequence-level knowledge distillation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 10817–10834, 2023b. URL https://aclanthology.org/2023.acl-long.605.
  55. The effects of language token prefixing for multilingual machine translation. In Proceedings of the Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, pages 148–153, 2022. URL https://aclanthology.org/2022.aacl-short.19.
  56. Huggingface’s Transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019. URL https://arxiv.org/abs/1910.03771.
  57. David H. Wolpert. Stacked generalization. Neural Networks, 5(2):241–259, 1992. URL https://www.sciencedirect.com/science/article/pii/S0893608005800231.
  58. One teacher is enough? Pre-trained language model distillation from multiple teachers. In Findings of the Association for Computational Linguistics: ACL-IJCNLP, pages 4408–4413, 2021. URL https://aclanthology.org/2021.findings-acl.387.
  59. Pivot language approach for phrase-based statistical machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics, pages 856–863, 2007. URL https://aclanthology.org/P07-1108.
  60. Revisiting pivot language approach for machine translation. In Proceedings of the Joint Conference of the Annual Meeting of the ACL and the International Joint Conference on Natural Language Processing of the AFNLP, pages 154–162, 2009. URL https://aclanthology.org/P09-1018.
  61. A survey on ensemble learning under the era of deep learning. Artificial Intelligence Review, 56(6):5545–5589, 2023. URL https://link.springer.com/article/10.1007/s10462-022-10283-5.
  62. Adaptive knowledge sharing in multi-task learning: Improving low-resource neural machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 656–661, 2018. URL https://aclanthology.org/P18-2104.
  63. Improving massively multilingual neural machine translation and zero-shot translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 1628–1639, 2020. URL https://aclanthology.org/2020.acl-main.148.
  64. BERTScore: Evaluating text generation with BERT. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=SkeHuCVFDr.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.