Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation (2404.08491v1)
Abstract: Large-scale multilingual Pretrained LLMs (mPLMs) yield impressive performance on cross-language tasks, yet significant performance disparities exist across different languages within the same mPLM. Previous studies endeavored to narrow these disparities by supervise fine-tuning the mPLMs with multilingual data. However, obtaining labeled multilingual data is time-consuming, and fine-tuning mPLM with limited labeled multilingual data merely encapsulates the knowledge specific to the labeled data. Therefore, we introduce ALSACE to leverage the learned knowledge from the well-performing languages to guide under-performing ones within the same mPLM, eliminating the need for additional labeled multilingual data. Experiments show that ALSACE effectively mitigates language-level performance disparity across various mPLMs while showing the competitive performance on different multilingual NLU tasks, ranging from full resource to limited resource settings. The code for our approach is available at https://github.com/pkunlp-icler/ALSACE.
- Coarse-to-fine dual encoders are better frame identification learners. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13455–13466, Singapore. Association for Computational Linguistics.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Unipcm: Universal pre-trained conversation model with task-aware automatic prompt.
- Infoxlm: An information-theoretic framework for cross-lingual language model pre-training.
- Improving pretrained cross-lingual language models via self-labeled word alignment.
- Xlm-e: Cross-lingual language model pre-training via electra.
- Monojit Choudhury and Amit Deshpande. 2021. How linguistically fair are multilingual pre-trained language models? In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12710–12718.
- Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
- Xnli: Evaluating cross-lingual sentence representations. arXiv preprint arXiv:1809.05053.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- A primer on pretrained multilingual language models. arXiv preprint arXiv:2107.00676.
- Data augmentation with adversarial training for cross-lingual NLI. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5158–5167, Online. Association for Computational Linguistics.
- On the effectiveness of adapter-based tuning for pretrained language model adaptation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2208–2222, Online. Association for Computational Linguistics.
- Distantly-supervised named entity recognition with uncertainty-aware teacher learning and student-student collaborative learning.
- Explicit alignment objectives for multilingual bidirectional encoders. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3633–3643, Online. Association for Computational Linguistics.
- Multilingual lama: Investigating knowledge in multilingual pretrained language models. arXiv preprint arXiv:2102.00894.
- Nora Kassner and Hinrich Schütze. 2019. Negated and misprimed probes for pretrained language models: Birds can talk, but cannot fly. arXiv preprint arXiv:1911.03343.
- From zero to hero: On the limitations of zero-shot language transfer with multilingual transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4483–4499.
- Mlqa: Evaluating cross-lingual extractive question answering. arXiv preprint arXiv:1910.07475.
- One shot learning as instruction data prospector for large language models.
- Towards robust aspect-based sentiment analysis through non-counterfactual augmentations. arXiv preprint arXiv:2306.13971.
- Ml-bench: Large language models leverage open-source libraries for machine learning tasks.
- Veco: Variable and flexible cross-lingual pre-training for language understanding and generation.
- Generating training data with language models: Towards zero-shot language understanding. arXiv preprint arXiv:2202.04538.
- Multilingual epidemiological text classification: a comparative study. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6172–6183.
- Thong Nguyen and Luu Anh Tuan. 2021. Improving neural cross-lingual summarization via employing optimal transport distance for knowledge distillation.
- Ernie-m: Enhanced multilingual representation by aligning cross-lingual semantics with monolingual corpora.
- XCOPA: A multilingual dataset for causal commonsense reasoning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2362–2376, Online. Association for Computational Linguistics.
- Enhancing cross-lingual natural language inference by prompt-learning from cross-lingual templates. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1910–1923, Dublin, Ireland. Association for Computational Linguistics.
- Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In 2011 AAAI Spring Symposium Series.
- Socialiqa: Commonsense reasoning about social interactions.
- Timo Schick and Hinrich Schütze. 2020. It’s not just size that matters: Small language models are also few-shot learners. arXiv preprint arXiv:2009.07118.
- SANTA: Separate strategies for inaccurate and incomplete annotation noise in distantly-supervised named entity recognition. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3883–3896, Toronto, Canada. Association for Computational Linguistics.
- Spokenwoz: A large-scale speech-text benchmark for spoken task-oriented dialogue agents. Advances in Neural Information Processing Systems, 36.
- Mining clues from incomplete utterance: A query-enhanced network for incomplete utterance rewriting. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4839–4847, Seattle, United States. Association for Computational Linguistics.
- SCL-RAI: Span-based contrastive learning with retrieval augmented inference for unlabeled entity problem in NER. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2313–2318, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs. In Proc. INTERSPEECH 2023, pages 1648–1652.
- Recipes for adapting pre-trained monolingual and multilingual models to machine translation. arXiv preprint arXiv:2004.14911.
- Bertnesia: Investigating the capture and forgetting of knowledge in bert. arXiv preprint arXiv:2106.02902.
- mt5: A massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934.
- Paws-x: A cross-lingual adversarial dataset for paraphrase identification. arXiv preprint arXiv:1908.11828.
- Cross-lingual text classification with multilingual distillation and zero-shot-aware training.
- Geomlama: Geo-diverse commonsense probing on multilingual pre-trained language models. arXiv preprint arXiv:2205.12247.
- Improving massively multilingual neural machine translation and zero-shot translation. arXiv preprint arXiv:2004.11867.
- Mmicl: Empowering vision-language model with multi-modal in-context learning.
- Haozhe Zhao (19 papers)
- Zefan Cai (26 papers)
- Shuzheng Si (20 papers)
- Liang Chen (360 papers)
- Yufeng He (2 papers)
- Kaikai An (15 papers)
- Baobao Chang (80 papers)