LangBridge: Multilingual Reasoning Without Multilingual Supervision (2401.10695v2)

Published 19 Jan 2024 in cs.CL

Abstract: We introduce LangBridge, a zero-shot approach to adapt LLMs for multilingual reasoning tasks without multilingual supervision. LangBridge operates by bridging two models, each specialized in different aspects: (1) one specialized in understanding multiple languages (e.g., mT5 encoder) and (2) one specialized in reasoning (e.g., MetaMath). LangBridge connects the two models by introducing minimal trainable parameters between them. Despite utilizing only English data for training, LangBridge considerably enhances the performance of LLMs on low-resource languages across mathematical reasoning, code completion, logical reasoning, and commonsense reasoning. Our analysis suggests that the efficacy of LangBridge stems from the language-agnostic characteristics of multilingual representations. We publicly release our code and models.

References (63)

Citations (7)

View on Semantic Scholar

Summary

The paper introduces LangBridge, a method that enables multilingual reasoning without requiring multilingual training data.
It aligns a language model with a multilingual encoder using minimal trainable parameters to improve performance in low-resource languages.
Empirical results show that pairing MetaMath with LangBridge yields accuracy comparable to much larger models like PaLM-540B.

Introduction

The presented paper introduces an approach named LangBridge, aimed at adapting LLMs (LMs) for multilingual reasoning tasks without requiring multilingual training data. This innovation bridges the functionalities of two distinct machine learning models: one skilled in multilingual understanding and another in reasoning tasks, using minimal trainable parameters. The paper positions LangBridge in contrast to prior methods that necessitated significant multilingual supervision, offering a prominent advantage by relying solely on English data during training while achieving zero-shot cross-lingual transfer competence.

Understanding LangBridge necessitates an appreciation of the prevailing landscape where LMs are predominantly trained on English-centric datasets, resulting in subpar performance on reasoning tasks in low-resource languages. Prior approaches advocating the continued training of these models using domain-specific datasets in target languages face scalabilities due to the requirement of language-specific corpora.

The scholarly narrative acknowledges the latent potential for zero-shot cross-lingual transfer in multilingual models honed on high-resource languages, effectively handing tasks in languages beyond the one used during fine-tuning. This concept has been expanded with efforts in aligning pretrained representations from different modalities, such as vision and language. These efforts crystalize within LangBridge, a method distinctively forgoing the aforementioned multilingual supervision.

LangBridge: Concept and Empiricism

LangBridge's central hypothesis rests on the idea that the representations of multilingual encoders are relatively language-agnostic and, by aligning these encoders with a LLM's input space, the model will parse semantics across supported languages without extensive multilingual data. The empirical results—stemming from using the mT5 encoder with LMs such as MetaMath and Orca 2—reveal pronounced enhancements in multilingual reasoning. Notably, this is evidenced by the substantial improvement in accuracy for low-resource languages. A vivid example is the elevation of the MetaMath model paired with LangBridge, leading to a comparable performance with the larger PaLM-540B model. The paper also indicates that the strength of LangBridge is derived from the inherent reasoning capabilities of the original LMs rather than the training datasets.

Analysis and Conclusion

The efficacy of LangBridge is underpinned by language-agnostic traits within multilingual representations. The findings manifest through principal component analysis, which exhibits a confluence of representations for diverse languages when processed through LangBridge. Additionally, rare but noteworthy incidences of accidental translation to third languages serve to underline the multi-language comprehending capability inherent in the system.

LangBridge is posited as a pioneering approach, effectively augmenting LMs to engage multilingual reasoning tasks without necessitating language-specific adaptation. This development promises to contribute valuably towards the proliferation of LMs that accommodate the full spectrum of global languages, particularly enhancing performance in low-resource language contexts. However, despite its promising capabilities, LangBridge models may not yet fully match the proficiency of multilingual LMs directly trained on non-English languages, and the extent of reasoning enhancement for a particular language is contingent on the encoder's pre-existing language proficiency.

Tweets

https://twitter.com/dongkeun_yoon/status/1749780869462782382