Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models (2306.10968v2)

Published 19 Jun 2023 in cs.CL and cs.AI
BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models

Abstract: LLMs have demonstrated remarkable prowess in language understanding and generation. Advancing from foundation LLMs to instructionfollowing LLMs, instruction tuning plays a vital role in aligning LLMs to human preferences. However, the existing LLMs are usually focused on English, leading to inferior performance in non-English languages. In order to improve the performance for non-English languages, it is necessary to collect language-specific training data for foundation LLMs and construct language-specific instructions for instruction tuning, both of which are heavy loads. To minimize human workload, we propose to transfer the capabilities of language generation and instruction following from English to other languages through an interactive translation task. We have developed BayLing, an instruction-following LLM by utilizing LLaMA as the foundation LLM and automatically constructing interactive translation instructions for instructing tuning. Extensive assessments demonstrate that BayLing achieves comparable performance to GPT-3.5-turbo, despite utilizing a considerably smaller parameter size of only 13 billion. Experimental results on translation tasks show that BayLing achieves 95% of single-turn translation capability compared to GPT-4 with automatic evaluation and 96% of interactive translation capability compared to GPT-3.5-turbo with human evaluation. To estimate the performance on general tasks, we created a multi-turn instruction test set called BayLing-80. The experimental results on BayLing-80 indicate that BayLing achieves 89% of performance compared to GPT-3.5-turbo. BayLing also demonstrates outstanding performance on knowledge assessment of Chinese GaoKao and English SAT, second only to GPT-3.5-turbo among a multitude of instruction-following LLMs. Demo, homepage, code and models of BayLing are available.

Analysis of BayLing: Enhancing Cross-lingual Capabilities in LLMs Through Interactive Translation

The paper presents BayLing, a sophisticated LLM developed to improve cross-lingual capabilities and instruction-following proficiency in non-English languages. Building on foundational LLMs like LLaMA, BayLing leverages interactive translation tasks to facilitate cross-lingual alignment, mitigate language-specific training burdens, and enhance performance in multilingual contexts. This paper explores various aspects of the BayLing model, articulating both its implementation and empirical evaluations across diverse language tasks.

BayLing is crafted using LLaMA as the foundation model. The prominent innovation in BayLing lies in its method of instruction tuning through interactive translation tasks. This approach eliminates the necessity of gathering extensive language-specific data, transferring English language capabilities to non-English contexts effectively. The paper emphasizes engaging a high-level interaction process between users and the model to refine language generation capabilities seamlessly.

Key Components and Methodology

  1. Foundation Model Selection: BayLing is structured upon LLaMA, an established LLM known for its robust English understanding capabilities. By building on this strong foundation, BayLing focuses on cross-lingual proficiency while maintaining a manageable model size.
  2. Interactive Translation: The interactive translation mechanism serves dual purposes: it aligns multiple languages with English and reinforces the model's ability to interpret and act on human instructions. This mechanistic tuning bypasses the heavy demand for non-English datasets, leveraging existing English-centric model training to other languages via cross-lingual tasks.
  3. Instruction Tuning: The model's excellence in instruction tuning and multi-turn interaction standardizes it for broader NLP tasks. By incorporating interactive translation instructions, BayLing hones its contextual comprehension and instruction-following capabilities within multi-turn dialogue frameworks.

Evaluation and Results

Extensive evaluations reveal BayLing’s proficiency:

  • Translation Tasks: BayLing achieves notable performance benchmarks, attaining 95% and 96% of translation capabilities compared to state-of-the-art models like GPT-3.5-turbo across Chinese-English and German-English benchmarks.
  • General Tasks: Evaluations on the BayLing-80 test set demonstrate BayLing achieving 89% of the performance of GPT-3.5-turbo, showcasing strengths in generic and knowledge tasks.
  • Standardized Tests: Remarkably, BayLing scores competitively on Chinese GaoKao and English SAT tests, emphasizing its effective knowledge transfer from English-centric corpora to other languages.

Key Outcomes and Implications

  • Cross-lingual Transfer Without Pre-training: BayLing's use of interactive tasks effectively transfers language generation and instruction compliance between languages, sidestepping the traditional requirement for large-scale non-English language pre-training.
  • Integration of Task Capabilities: Through interactive translation, multi-capability enhancement coalesces in BayLing, offering a streamlined methodology to simultaneously elevate language alignment and human instruction adherence.
  • Benchmark Setting: BayLing posits itself as a measurable, openly available benchmark in multilingual translation, encouraging onward advancements and model comparisons in translation tasks.

Future Prospective and Considerations

BayLing provides a compelling blueprint for future cross-lingual innovations in LLM research. Its methodology encourages leveraging foundational models and task-specific tuning to expand LLM competencies efficiently. However, it also highlights several areas for future exploration, including enhancing capabilities in math, coding, and reasoning tasks where the performance still lags behind leading LLM models such as GPT-3.5-turbo.

In essence, BayLing exemplifies a balanced and insightful approach to augmenting non-English language capabilities in LLMs. Its elegance lies in streamlining resource input while maximizing linguistic output, paving the way for extensive applications and fostering cross-lingual understanding through intelligent interaction and alignment tactics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Language models are few-shot learners, 2020.
  2. Palm: Scaling language modeling with pathways, 2022.
  3. Opt: Open pre-trained transformer language models, 2022.
  4. GLM: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.26. URL https://aclanthology.org/2022.acl-long.26.
  5. Bloom: A 176b-parameter open-access multilingual language model, 2023.
  6. Llama: Open and efficient foundation language models, 2023.
  7. Deep reinforcement learning from human preferences. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/d5e2c0adad503c91f91df240d0cd4e49-Paper.pdf.
  8. Training language models to follow instructions with human feedback. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 27730–27744. Curran Associates, Inc., 2022a. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf.
  9. Training language models to follow instructions with human feedback, 2022b.
  10. Cross-task generalization via natural language crowdsourcing instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3470–3487, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.244. URL https://aclanthology.org/2022.acl-long.244.
  11. OpenAI. Introducing chatgpt, 2022. URL https://openai.com/blog/chatgpt.
  12. OpenAI. Gpt-4 technical report, 2023.
  13. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  14. The unreasonable effectiveness of few-shot learning for machine translation, 2023.
  15. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
  16. Principle-driven self-alignment of language models from scratch with minimal human supervision, 2023.
  17. Openassistant conversations – democratizing large language model alignment, 2023.
  18. Lima: Less is more for alignment, 2023.
  19. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20, page 3505–3506, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450379984. doi: 10.1145/3394486.3406703. URL https://doi.org/10.1145/3394486.3406703.
  20. Training deep nets with sublinear memory cost, 2016.
  21. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5:339–351, 2017. doi: 10.1162/tacl_a_00065. URL https://aclanthology.org/Q17-1024.
  22. No language left behind: Scaling human-centered machine translation, 2022.
  23. Parrot: Translating during chat using large language models, 2023.
  24. Matt Post. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191, Brussels, Belgium, October 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-6319. URL https://aclanthology.org/W18-6319.
  25. COMET-22: Unbabel-IST 2022 submission for the metrics shared task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 578–585, Abu Dhabi, United Arab Emirates (Hybrid), December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.wmt-1.52.
  26. Multilingual machine translation with large language models: Empirical results and analysis, 2023.
  27. Improved lexically constrained decoding for translation and monolingual rewriting. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 839–850, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1090. URL https://www.aclweb.org/anthology/N19-1090.
  28. Lexical-constraint-aware neural machine translation via data augmentation. In Christian Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pages 3587–3593. International Joint Conferences on Artificial Intelligence Organization, 7 2020. doi: 10.24963/ijcai.2020/496. URL https://doi.org/10.24963/ijcai.2020/496. Main track.
  29. End-to-end lexically constrained machine translation for morphologically rich languages. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4019–4033, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.311. URL https://aclanthology.org/2021.acl-long.311.
  30. G-eval: Nlg evaluation using gpt-4 with better human alignment, 2023.
  31. Agieval: A human-centric benchmark for evaluating foundation models, 2023.
  32. Large language models are not fair evaluators, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Shaolei Zhang (36 papers)
  2. Qingkai Fang (19 papers)
  3. Zhuocheng Zhang (9 papers)
  4. Zhengrui Ma (18 papers)
  5. Yan Zhou (206 papers)
  6. Langlin Huang (8 papers)
  7. Mengyu Bu (3 papers)
  8. Shangtong Gui (4 papers)
  9. Yunji Chen (51 papers)
  10. Xilin Chen (119 papers)
  11. Yang Feng (230 papers)
Citations (32)