Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
117 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LangBridge: Multilingual Reasoning Without Multilingual Supervision (2401.10695v2)

Published 19 Jan 2024 in cs.CL

Abstract: We introduce LangBridge, a zero-shot approach to adapt LLMs for multilingual reasoning tasks without multilingual supervision. LangBridge operates by bridging two models, each specialized in different aspects: (1) one specialized in understanding multiple languages (e.g., mT5 encoder) and (2) one specialized in reasoning (e.g., MetaMath). LangBridge connects the two models by introducing minimal trainable parameters between them. Despite utilizing only English data for training, LangBridge considerably enhances the performance of LLMs on low-resource languages across mathematical reasoning, code completion, logical reasoning, and commonsense reasoning. Our analysis suggests that the efficacy of LangBridge stems from the language-agnostic characteristics of multilingual representations. We publicly release our code and models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Quality estimation via backtranslation at the WMT 2022 quality estimation task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 593–596, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  2. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736.
  3. Llemma: An open language model for mathematics. arXiv preprint arXiv:2310.06786.
  4. Llm augmented llms: Expanding capabilities through composition.
  5. Introducing our multimodal models.
  6. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  7. Breaking language barriers in multilingual mathematical reasoning: Insights and observations.
  8. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
  9. Unimax: Fairer and more effective language sampling for large-scale multilingual pretraining. In The Eleventh International Conference on Learning Representations.
  10. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  11. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
  12. No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672.
  13. Language-agnostic BERT sentence embedding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 878–891, Dublin, Ireland. Association for Computational Linguistics.
  14. The Flores-101 evaluation benchmark for low-resource and multilingual machine translation. Transactions of the Association for Computational Linguistics, 10:522–538.
  15. Measuring massive multitask language understanding. Proceedings of the International Conference on Learning Representations (ICLR).
  16. Measuring mathematical problem solving with the math dataset. NeurIPS.
  17. Mistral 7b.
  18. Is chatgpt a good translator? yes with gpt-4 as the engine.
  19. Large language models struggle to learn long-tail knowledge.
  20. Turning english-centric llms into polyglots: How much multilinguality is needed?
  21. Okapi: Instruction-tuned large language models in multiple languages with reinforcement learning from human feedback.
  22. The bigscience roots corpus: A 1.6 tb composite multilingual dataset. Advances in Neural Information Processing Systems, 35:31809–31826.
  23. Mind the gap: Assessing temporal generalization in neural language models.
  24. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  25. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models.
  26. Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161.
  27. Tianjian Li and Kenton Murray. 2023. Why does zero-shot cross-lingual generation fail? an explanation and a solution. In Findings of the Association for Computational Linguistics: ACL 2023, pages 12461–12476, Toronto, Canada. Association for Computational Linguistics.
  28. Openorca: An open dataset of gpt augmented flan reasoning traces. https://https://huggingface.co/Open-Orca/OpenOrca.
  29. On the language neutrality of pre-trained multilingual representations. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1663–1674, Online. Association for Computational Linguistics.
  30. Few-shot learning with multilingual language models.
  31. Improved baselines with visual instruction tuning.
  32. Visual instruction tuning.
  33. Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization.
  34. Mini-model adaptation: Efficiently extending pretrained models to new languages via aligned shallow training. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5474–5490, Toronto, Canada. Association for Computational Linguistics.
  35. Linearly mapping from image to text space.
  36. Orca 2: Teaching small language models how to reason.
  37. Orca: Progressive learning from complex explanation traces of gpt-4.
  38. Second language acquisition of neural language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 13557–13572, Toronto, Canada. Association for Computational Linguistics.
  39. OpenAI. 2023. Gpt-4 technical report.
  40. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  41. Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2080–2094, Online. Association for Computational Linguistics.
  42. How multilingual is multilingual BERT? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4996–5001, Florence, Italy. Association for Computational Linguistics.
  43. XCOPA: A multilingual dataset for causal commonsense reasoning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2362–2376, Online. Association for Computational Linguistics.
  44. Maja Popović. 2015. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392–395, Lisbon, Portugal. Association for Computational Linguistics.
  45. Cross-lingual prompting: Improving zero-shot chain-of-thought reasoning across languages. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2695–2709.
  46. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  47. Nils Reimers and Iryna Gurevych. 2020. Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4512–4525, Online. Association for Computational Linguistics.
  48. Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In 2011 AAAI Spring Symposium Series.
  49. Code llama: Open foundation models for code.
  50. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  51. Self-attention with relative position representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 464–468, New Orleans, Louisiana. Association for Computational Linguistics.
  52. Language models are multilingual chain-of-thought reasoners. In The Eleventh International Conference on Learning Representations.
  53. Language models are multilingual chain-of-thought reasoners. arXiv preprint arXiv:2210.03057.
  54. SlimPajama: A 627B token cleaned and deduplicated version of RedPajama.
  55. Challenging BIG-bench tasks and whether chain-of-thought can solve them. In Findings of the Association for Computational Linguistics: ACL 2023, pages 13003–13051, Toronto, Canada. Association for Computational Linguistics.
  56. Llama: Open and efficient foundation language models.
  57. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  58. Overcoming catastrophic forgetting in zero-shot cross-lingual generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9279–9300, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  59. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
  60. Metamath: Bootstrap your own mathematical questions for large language models.
  61. Scaling relationship on learning mathematical reasoning with large language models.
  62. Extrapolating large language models to non-english by aligning languages.
  63. Rethinking round-trip translation for machine translation evaluation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 319–337, Toronto, Canada. Association for Computational Linguistics.
Citations (7)

Summary

  • The paper introduces LangBridge, a method that enables multilingual reasoning without requiring multilingual training data.
  • It aligns a language model with a multilingual encoder using minimal trainable parameters to improve performance in low-resource languages.
  • Empirical results show that pairing MetaMath with LangBridge yields accuracy comparable to much larger models like PaLM-540B.

Introduction

The presented paper introduces an approach named LangBridge, aimed at adapting LLMs (LMs) for multilingual reasoning tasks without requiring multilingual training data. This innovation bridges the functionalities of two distinct machine learning models: one skilled in multilingual understanding and another in reasoning tasks, using minimal trainable parameters. The paper positions LangBridge in contrast to prior methods that necessitated significant multilingual supervision, offering a prominent advantage by relying solely on English data during training while achieving zero-shot cross-lingual transfer competence.

Understanding LangBridge necessitates an appreciation of the prevailing landscape where LMs are predominantly trained on English-centric datasets, resulting in subpar performance on reasoning tasks in low-resource languages. Prior approaches advocating the continued training of these models using domain-specific datasets in target languages face scalabilities due to the requirement of language-specific corpora.

The scholarly narrative acknowledges the latent potential for zero-shot cross-lingual transfer in multilingual models honed on high-resource languages, effectively handing tasks in languages beyond the one used during fine-tuning. This concept has been expanded with efforts in aligning pretrained representations from different modalities, such as vision and language. These efforts crystalize within LangBridge, a method distinctively forgoing the aforementioned multilingual supervision.

LangBridge: Concept and Empiricism

LangBridge's central hypothesis rests on the idea that the representations of multilingual encoders are relatively language-agnostic and, by aligning these encoders with a LLM's input space, the model will parse semantics across supported languages without extensive multilingual data. The empirical results—stemming from using the mT5 encoder with LMs such as MetaMath and Orca 2—reveal pronounced enhancements in multilingual reasoning. Notably, this is evidenced by the substantial improvement in accuracy for low-resource languages. A vivid example is the elevation of the MetaMath model paired with LangBridge, leading to a comparable performance with the larger PaLM-540B model. The paper also indicates that the strength of LangBridge is derived from the inherent reasoning capabilities of the original LMs rather than the training datasets.

Analysis and Conclusion

The efficacy of LangBridge is underpinned by language-agnostic traits within multilingual representations. The findings manifest through principal component analysis, which exhibits a confluence of representations for diverse languages when processed through LangBridge. Additionally, rare but noteworthy incidences of accidental translation to third languages serve to underline the multi-language comprehending capability inherent in the system.

LangBridge is posited as a pioneering approach, effectively augmenting LMs to engage multilingual reasoning tasks without necessitating language-specific adaptation. This development promises to contribute valuably towards the proliferation of LMs that accommodate the full spectrum of global languages, particularly enhancing performance in low-resource language contexts. However, despite its promising capabilities, LangBridge models may not yet fully match the proficiency of multilingual LMs directly trained on non-English languages, and the extent of reasoning enhancement for a particular language is contingent on the encoder's pre-existing language proficiency.

X Twitter Logo Streamline Icon: https://streamlinehq.com