Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-Lingual Supervision improves Large Language Models Pre-training (2305.11778v1)

Published 19 May 2023 in cs.CL and cs.LG

Abstract: The recent rapid progress in pre-training LLMs has relied on using self-supervised LLMing objectives like next token prediction or span corruption. On the other hand, Machine Translation Systems are mostly trained using cross-lingual supervision that requires aligned data between source and target languages. We demonstrate that pre-training LLMs on a mixture of a self-supervised LLMing objective and the supervised Machine Translation objective, therefore including cross-lingual parallel data during pre-training, yields models with better in-context learning abilities. As pre-training is a very resource-intensive process and a grid search on the best mixing ratio between the two objectives is prohibitively expensive, we propose a simple yet effective strategy to learn it during pre-training.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Andrea Schioppa (21 papers)
  2. Xavier Garcia (36 papers)
  3. Orhan Firat (80 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.