Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study
Abstract: Neural Machine Translation (NMT) remains a formidable challenge, especially when dealing with low-resource languages. Pre-trained sequence-to-sequence (seq2seq) multi-lingual models, such as mBART-50, have demonstrated impressive performance in various low-resource NMT tasks. However, their pre-training has been confined to 50 languages, leaving out support for numerous low-resource languages, particularly those spoken in the Indian subcontinent. Expanding mBART-50's language support requires complex pre-training, risking performance decline due to catastrophic forgetting. Considering these expanding challenges, this paper explores a framework that leverages the benefits of a pre-trained LLM along with knowledge distillation in a seq2seq architecture to facilitate translation for low-resource languages, including those not covered by mBART-50. The proposed framework employs a multilingual encoder-based seq2seq model as the foundational architecture and subsequently uses complementary knowledge distillation techniques to mitigate the impact of imbalanced training. Our framework is evaluated on three low-resource Indic languages in four Indic-to-Indic directions, yielding significant BLEU-4 and chrF improvements over baselines. Further, we conduct human evaluation to confirm effectiveness of our approach. Our code is publicly available at https://github.com/raypretam/Two-step-low-res-NMT.
- Neural machine translation by jointly learning to align and translate.
- Findings of the 2019 conference on machine translation (WMT19). In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 1โ61, Florence, Italy. Association for Computational Linguistics.
- Language model prior for low-resource neural machine translation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7622โ7634, Online. Association for Computational Linguistics.
- Zero-shot cross-lingual transfer of neural machine translation with multilingual pretrained encoders. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 15โ26, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Towards making the most of cross-lingual transfer for zero-shot neural machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 142โ157, Dublin, Ireland. Association for Computational Linguistics.
- Distilling knowledge learned in BERT for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7893โ7905, Online. Association for Computational Linguistics.
- Cross-lingual natural language generation via pre-training.
- Unsupervised cross-lingual representation learning at scale.
- Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. Advances in neural information processing systems, 32.
- Guiding teacher forcing with seer forcing for neural machine translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2862โ2872, Online. Association for Computational Linguistics.
- Robert French. 1999. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3:128โ135.
- Non-autoregressive neural machine translation. In International Conference on Learning Representations (ICLR).
- Survey of low-resource machine translation. Computational Linguistics, 48(3):673โ732.
- Kenji Imamura and Eiichiro Sumita. 2019. Recycling a pre-trained BERT encoder for neural machine translation. In Proceedings of the 3rd Workshop on Neural Generation and Translation, pages 23โ31, Hong Kong. Association for Computational Linguistics.
- Nearest neighbor machine translation. In International Conference on Learning Representations.
- Yoon Kim and Alexanderย M. Rush. 2016. Sequence-level knowledge distillation.
- Diederikย P. Kingma and Jimmy Ba. 2017. Adam: A method for stochastic optimization.
- Efficient backprop. In Neural networks: Tricks of the trade, pages 9โ50. Springer.
- Multilingual bidirectional unsupervised translation through multilingual finetuning and back-translation. In Proceedings of the Sixth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2023), pages 16โ31.
- Norm-based curriculum learning for neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 427โ436, Online. Association for Computational Linguistics.
- On the copying behaviors of pre-training for neural machine translation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4265โ4275, Online. Association for Computational Linguistics.
- Multilingual denoising pre-training for neural machine translation.
- Xlm-t: Scaling up multilingual machine translation with pretrained cross-lingual transformer encoders.
- Zmbart: An unsupervised cross-lingual transfer framework for language generation.
- fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of NAACL-HLT 2019: Demonstrations.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311โ318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
- Maja Popoviฤ. 2015. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392โ395, Lisbon, Portugal. Association for Computational Linguistics.
- Samanantar: The largest publicly available parallel corpora collection for 11 Indic languages. Transactions of the Association for Computational Linguistics, 10:145โ162.
- Meta-ed: Cross-lingual event detection using meta-learning for indian languages. ACM Trans. Asian Low-Resour. Lang. Inf. Process., 22(2).
- Does meta-learning help mBERT for few-shot question generation in a cross-lingual transfer setting for indic languages? In Proceedings of the 29th International Conference on Computational Linguistics, pages 4251โ4257, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Chenze Shao and Yang Feng. 2022. Overcoming catastrophic forgetting beyond continual learning: Balanced training for neural machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2023โ2036.
- Mass: Masked sequence to sequence pre-training for language generation.
- NLLB Team. 2022. No language left behind: Scaling human-centered machine translation.
- Attention is all you need.
- Selective knowledge distillation for neural machine translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6456โ6466.
- Understanding and improving sequence-to-sequence pretraining for neural machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2591โ2600, Dublin, Ireland. Association for Computational Linguistics.
- Acquiring knowledge from pre-trained model to neural machine translation.
- Why skip if you can combine: A simple knowledge distillation technique for intermediate layers.
- Towards making the most of bert in neural machine translation.
- Deep mutual learning. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4320โ4328.
- Understanding knowledge distillation in non-autoregressive machine translation.
- Incorporating bert into neural machine translation.
- Knowledge distillation by on-the-fly native ensemble. Advances in neural information processing systems, 31.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.