Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multilingual BERT Post-Pretraining Alignment (2010.12547v2)

Published 23 Oct 2020 in cs.CL

Abstract: We propose a simple method to align multilingual contextual embeddings as a post-pretraining step for improved zero-shot cross-lingual transferability of the pretrained models. Using parallel data, our method aligns embeddings on the word level through the recently proposed Translation LLMing objective as well as on the sentence level via contrastive learning and random input shuffling. We also perform sentence-level code-switching with English when finetuning on downstream tasks. On XNLI, our best model (initialized from mBERT) improves over mBERT by 4.7% in the zero-shot setting and achieves comparable result to XLM for translate-train while using less than 18% of the same parallel data and 31% less model parameters. On MLQA, our model outperforms XLM-R_Base that has 57% more parameters than ours.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Lin Pan (23 papers)
  2. Chung-Wei Hang (14 papers)
  3. Haode Qi (5 papers)
  4. Abhishek Shah (12 papers)
  5. Saloni Potdar (20 papers)
  6. Mo Yu (117 papers)
Citations (43)