Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

xCoT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning (2401.07037v1)

Published 13 Jan 2024 in cs.CL and cs.AI

Abstract: Chain-of-thought (CoT) has emerged as a powerful technique to elicit reasoning in LLMs and improve a variety of downstream tasks. CoT mainly demonstrates excellent performance in English, but its usage in low-resource languages is constrained due to poor language generalization. To bridge the gap among different languages, we propose a cross-lingual instruction fine-tuning framework (xCOT) to transfer knowledge from high-resource languages to low-resource languages. Specifically, the multilingual instruction training data (xCOT-INSTRUCT) is created to encourage the semantic alignment of multiple languages. We introduce cross-lingual in-context few-shot learning (xICL)) to accelerate multilingual agreement in instruction tuning, where some fragments of source languages in examples are randomly substituted by their counterpart translations of target languages. During multilingual instruction tuning, we adopt the randomly online CoT strategy to enhance the multilingual reasoning ability of the LLM by first translating the query to another language and then answering in English. To further facilitate the language transfer, we leverage the high-resource CoT to supervise the training of low-resource languages with cross-lingual distillation. Experimental results on previous benchmarks demonstrate the superior performance of xCoT in reducing the gap among different languages, highlighting its potential to reduce the cross-lingual gap.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Qwen technical report. CoRR, abs/2309.16609.
  2. Crosssum: Beyond english-centric cross-lingual summarization for 1, 500+ language pairs. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 2541–2564. Association for Computational Linguistics.
  3. Breaking language barriers in multilingual mathematical reasoning: Insights and observations. CoRR, abs/2310.20246.
  4. Breaking language barriers in multilingual mathematical reasoning: Insights and observations. arXiv preprint arXiv:2310.20246.
  5. Training verifiers to solve math word problems. CoRR, abs/2110.14168.
  6. Unsupervised cross-lingual representation learning at scale. In ACL 2020, pages 8440–8451.
  7. Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. In NeurIPS 2019, pages 7057–7067.
  8. GLM: general language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 320–335. Association for Computational Linguistics.
  9. Multilingual clinical NER: translation or cross-lingual transfer? In Proceedings of the 5th Clinical Natural Language Processing Workshop, ClinicalNLP@ACL 2023, Toronto, Canada, July 14, 2023, pages 289–311. Association for Computational Linguistics.
  10. OWL: A large language model for IT operations. CoRR, abs/2309.09298.
  11. Large language models are reasoning teachers. arXiv preprint arXiv:2212.10071.
  12. Large language models are zero-shot reasoners. In NeurIPS.
  13. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  14. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. arXiv preprint arXiv:2308.09583.
  15. XLM-T: scaling up multilingual machine translation with pretrained cross-lingual transformer encoders. CoRR, abs/2012.15547.
  16. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. JMLR, 9(Nov):2579–2605.
  17. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786.
  18. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  19. Training language models to follow instructions with human feedback. In NeurIPS.
  20. Bidirectional language models are also few-shot learners. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  21. Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, pages 2080–2094. Association for Computational Linguistics.
  22. Cross-lingual prompting: Improving zero-shot chain-of-thought reasoning across languages. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 2695–2709. Association for Computational Linguistics.
  23. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  24. Language models are multilingual chain-of-thought reasoners. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  25. Multilingual neural machine translation with language clustering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 963–973. Association for Computational Linguistics.
  26. Llama: Open and efficient foundation language models. CoRR, abs/2302.13971.
  27. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  28. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  29. Understanding translationese in cross-lingual summarization. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 3837–3849. Association for Computational Linguistics.
  30. Finetuned language models are zero-shot learners. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  31. Emergent abilities of large language models. Transactions on Machine Learning Research. Survey Certification.
  32. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS.
  33. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  34. Unitrans : Unifying model transfer and data transfer for cross-lingual named entity recognition with unlabeled data. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 3926–3932. ijcai.org.
  35. CROP: zero-shot cross-lingual named entity recognition with multilingual labeled sequence translation. In Findings of EMNLP 2022, pages 486–496.
  36. GanLM: Encoder-decoder pre-training with an auxiliary discriminator. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9394–9412, Toronto, Canada. Association for Computational Linguistics.
  37. Alternating language modeling for cross-lingual pre-training. In AAAI 2020, pages 9386–9393.
  38. High-resource language-specific training for multilingual neural machine translation. In IJCAI 2022, pages 4461–4467.
  39. UM4: unified multilingual multiple teacher-student model for zero-resource neural machine translation. In IJCAI 2022, pages 4454–4460.
  40. Scaling relationship on learning mathematical reasoning with large language models. arXiv preprint arXiv:2308.01825.
  41. Mammoth: Building math generalist models through hybrid instruction tuning. arXiv preprint arXiv:2309.05653.
  42. Multimodal chain-of-thought reasoning in language models. CoRR, abs/2302.00923.
  43. Conner: Consistency training for cross-lingual named entity recognition. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 8438–8449. Association for Computational Linguistics.
  44. Solving math word problems via cooperative reasoning induced language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4471–4485, Toronto, Canada. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Linzheng Chai (16 papers)
  2. Jian Yang (503 papers)
  3. Tao Sun (143 papers)
  4. Hongcheng Guo (39 papers)
  5. Jiaheng Liu (100 papers)
  6. Bing Wang (246 papers)
  7. Xiannian Liang (1 paper)
  8. Jiaqi Bai (19 papers)
  9. Tongliang Li (18 papers)
  10. Qiyao Peng (19 papers)
  11. Zhoujun Li (122 papers)
Citations (41)