Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What Do Compressed Multilingual Machine Translation Models Forget? (2205.10828v4)

Published 22 May 2022 in cs.CL, cs.AI, and cs.LG

Abstract: Recently, very large pre-trained models achieve state-of-the-art results in various NLP tasks, but their size makes it more challenging to apply them in resource-constrained environments. Compression techniques allow to drastically reduce the size of the models and therefore their inference time with negligible impact on top-tier metrics. However, the general performance averaged across multiple tasks and/or languages may hide a drastic performance drop on under-represented features, which could result in the amplification of biases encoded by the models. In this work, we assess the impact of compression methods on Multilingual Neural Machine Translation models (MNMT) for various language groups, gender, and semantic biases by extensive analysis of compressed models on different machine translation benchmarks, i.e. FLORES-101, MT-Gender, and DiBiMT. We show that the performance of under-represented languages drops significantly, while the average BLEU metric only slightly decreases. Interestingly, the removal of noisy memorization with compression leads to a significant improvement for some medium-resource languages. Finally, we demonstrate that compression amplifies intrinsic gender and semantic biases, even in high-resource languages. Code: https://github.com/alirezamshi/bias-compressedMT

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Massively multilingual neural machine translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3874–3884, Minneapolis, Minnesota. Association for Computational Linguistics.
  2. The low-resource double bind: An empirical study of pruning for low-resource machine translation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3316–3333, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  3. Maximiliana Behnke and Kenneth Heafield. 2020. Losing heads in the lottery: Pruning transformer attention in neural machine translation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2664–2674, Online. Association for Computational Linguistics.
  4. Gender in danger? evaluating speech translation technology on the MuST-SHE corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6923–6933, Online. Association for Computational Linguistics.
  5. A study of gender impact in self-supervised models for speech-to-text systems.
  6. Understanding and overcoming the challenges of efficient transformer quantization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7947–7969, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  7. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  8. DiBiMT: A novel benchmark for measuring Word Sense Disambiguation biases in Machine Translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4331–4352, Dublin, Ireland. Association for Computational Linguistics.
  9. Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2978–2988, Florence, Italy. Association for Computational Linguistics.
  10. Stereotype and skew: Quantifying gender bias in pre-trained and fine-tuned language models. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2232–2242, Online. Association for Computational Linguistics.
  11. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  12. What do compressed large language models forget? robustness challenges in model compression.
  13. CCAligned: A massive collection of cross-lingual web-document pairs. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5960–5969, Online. Association for Computational Linguistics.
  14. Beyond english-centric multilingual machine translation.
  15. Jonathan Frankle and Michael Carbin. 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations.
  16. The state of sparsity in deep neural networks.
  17. Larger-scale transformers for multilingual masked language modeling.
  18. The flores-101 evaluation benchmark for low-resource and multilingual machine translation.
  19. What do compressed deep neural networks forget?
  20. Characterising bias in compressed models.
  21. Fasttext.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651.
  22. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.
  23. I-bert: Integer-only bert quantization. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 5506–5518. PMLR.
  24. Yoon Kim and Alexander M. Rush. 2016. Sequence-level knowledge distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1317–1327, Austin, Texas. Association for Computational Linguistics.
  25. Do multilingual neural machine translation models contain language pair specific attention heads? In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2832–2841, Online. Association for Computational Linguistics.
  26. Gender coreference and bias evaluation at WMT 2020. In Proceedings of the Fifth Conference on Machine Translation, pages 357–364, Online. Association for Computational Linguistics.
  27. Sustainable modular debiasing of language models. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4782–4797, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  28. Character-level feature extraction with densely connected networks. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3228–3239, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
  29. Learning light-weight translation models from deep transformer.
  30. Pruning and quantization for deep neural network acceleration: A survey.
  31. Gaurav Menghani. 2021. Efficient deep learning: A survey on making deep learning models smaller, faster, and better.
  32. Suyog Gupta Michael H. Zhu. 2018. To prune, or not to prune: Exploring the efficacy of pruning for model compression.
  33. Introduction to WordNet: An On-line Lexical Database*. International Journal of Lexicography, 3(4):235–244.
  34. Intriguing properties of compression on multilingual models.
  35. Carbon emissions and large neural network training.
  36. Maja Popović. 2015. chrF: character n-gram f-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392–395, Lisbon, Portugal. Association for Computational Linguistics.
  37. Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191, Belgium, Brussels. Association for Computational Linguistics.
  38. The curious case of hallucinations in neural machine translation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1172–1183, Online. Association for Computational Linguistics.
  39. Gender bias amplification during speed-quality optimization in neural machine translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 99–109, Online. Association for Computational Linguistics.
  40. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.
  41. Under the morphosyntactic lens: A multifaceted evaluation of gender bias in speech translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1807–1824, Dublin, Ireland. Association for Computational Linguistics.
  42. CCMatrix: Mining billions of high-quality parallel sentences on the web. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6490–6500, Online. Association for Computational Linguistics.
  43. Evaluating gender bias in machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1679–1684, Florence, Italy. Association for Computational Linguistics.
  44. Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3645–3650, Florence, Italy. Association for Computational Linguistics.
  45. Multilingual translation with extensible multilingual pretraining and finetuning.
  46. Compression of generative pre-trained language models via quantization.
  47. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  48. Jesse Vig and Yonatan Belinkov. 2019. Analyzing the structure of attention in a transformer language model. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 63–76, Florence, Italy. Association for Computational Linguistics.
  49. Selective knowledge distillation for neural machine translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6456–6466, Online. Association for Computational Linguistics.
  50. Deepnet: Scaling transformers to 1,000 layers.
  51. Qdrop: Randomly dropping quantization for extremely low-bit post-training quantization.
  52. Integer quantization for deep learning inference: Principles and empirical evaluation.
  53. Beyond preserved accuracy: Evaluating loyalty and robustness of BERT compression. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10653–10659, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  54. Alternating multi-bit quantization for recurrent neural networks.
  55. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
  56. Textpruner: A model pruning toolkit for pre-trained language models.
  57. Zeroquant: Efficient and affordable post-training quantization for large-scale transformers.
  58. Improving massively multilingual neural machine translation and zero-shot translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1628–1639, Online. Association for Computational Linguistics.
  59. Opt: Open pre-trained transformer language models.
  60. Enlivening redundant heads in multi-head self-attention for machine translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3238–3248, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Alireza Mohammadshahi (13 papers)
  2. Vassilina Nikoulina (28 papers)
  3. Alexandre Berard (20 papers)
  4. Caroline Brun (7 papers)
  5. James Henderson (52 papers)
  6. Laurent Besacier (76 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com