Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 173 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 37 tok/s Pro
GPT-5 High 38 tok/s Pro
GPT-4o 124 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 442 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning (2312.06134v1)

Published 11 Dec 2023 in cs.CL and cs.LG

Abstract: In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance. We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks. We provide a thorough empirical study and analysis of this method's benefits showing that it achieves consistent improvements relative to the performance trade-off profile of standard static weighting. We analyze under what data regimes this method is applicable and show its improvements empirically in neural machine translation (NMT) and multi-lingual language modeling.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. In neural machine translation, what does transfer learning transfer? Association for Computational Linguistics, 2020.
  2. Massively multilingual neural machine translation in the wild: Findings and challenges. arXiv preprint arXiv:1907.05019, 2019.
  3. mslam: Massively multilingual joint pre-training for speech and text. arXiv preprint arXiv:2202.01374, 2022.
  4. Slam: A unified encoder for speech and language modeling via speech-text joint pre-training. arXiv preprint arXiv:2110.10329, 2021.
  5. Convex optimization. Cambridge university press, 2004.
  6. Unimax: Fairer and more effective language sampling for large-scale multilingual pretraining. In The Eleventh International Conference on Learning Representations, 2022.
  7. Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116, 2019.
  8. No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672, 2022.
  9. Exploiting multilingualism through multistage fine-tuning for low-resource neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1410–1416, 2019.
  10. An empirical study of language relatedness for transfer learning in neural machine translation. In Proceedings of the 31st Pacific Asia conference on language, information and computation, pages 282–286, 2017.
  11. Jacob Devlin. Multilingual bert readme. https://github.com/google-research/bert/blob/master/multilingual.md, 2018.
  12. Multi-way, multilingual neural machine translation with a shared attention mechanism. arXiv preprint arXiv:1601.01073, 2016.
  13. Adaptive scheduling for multi-task learning. arXiv preprint arXiv:1909.06434, 2019.
  14. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5:339–351, 2017.
  15. Effective cross-lingual transfer of neural machine translation models without shared vocabularies. arXiv preprint arXiv:1905.05475, 2019.
  16. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  17. Trivial transfer learning for low-resource neural machine translation. arXiv preprint arXiv:1809.00357, 2018.
  18. Bandits don’t follow rules: Balancing multi-facet machine translation with multi-armed bandits. arXiv preprint arXiv:2110.06997, 2021.
  19. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226, 2018.
  20. In defense of the unitary scalarization for deep multi-task learning. arXiv preprint arXiv:2201.04122, 2022.
  21. Transfer learning in multilingual neural machine translation with dynamic vocabulary. arXiv preprint arXiv:1811.01137, 2018.
  22. English intermediate-task training improves zero-shot cross-lingual transfer too. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 557–575, 2020.
  23. Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks. arXiv preprint arXiv:1811.01088, 2018.
  24. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67, 2020.
  25. Scaling up models and data with t5x and seqio. arXiv preprint arXiv:2203.17189, 2022.
  26. Overcoming catastrophic forgetting beyond continual learning: Balanced training for neural machine translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2023–2036, 2022.
  27. Adafactor: Adaptive learning rates with sublinear memory cost. In International Conference on Machine Learning, pages 4596–4604. PMLR, 2018.
  28. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  29. Balancing training for multilingual neural machine translation. arXiv preprint arXiv:2004.06748, 2020.
  30. Multi-task learning for multilingual neural machine translation. arXiv preprint arXiv:2010.02523, 2020.
  31. Do current multi-task optimization methods in deep learning even help? Advances in Neural Information Processing Systems, 35:13597–13609, 2022.
  32. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online, June 2021. Association for Computational Linguistics.
  33. Transfer learning for low-resource neural machine translation. arXiv preprint arXiv:1604.02201, 2016.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.