Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model (2404.16766v1)

Published 25 Apr 2024 in cs.CL and cs.AI

Abstract: While supervised fine-tuning (SFT) has been a straightforward approach for tailoring the output of foundation LLM to specific preferences, concerns have been raised about the depth of this alignment, with some critiques suggesting it is merely "superficial". We critically examine this hypothesis within the scope of cross-lingual generation tasks, proposing that the effectiveness of SFT may be constrained by its reliance on prior tokens to guide cross-lingual generation. Based on this crucial insight, and in response to the challenges posed by the costly and limited availability of non-English data for SFT, we introduce a novel training-free alignment method named PreTTY, which employs minimal task-related prior tokens to bridge the foundation LLM and the SFT LLM, achieving comparable performance without training. Experiments on machine translation and part-of-speech tagging across eight languages demonstrate the efficacy of PreTTY in cross-lingual settings. Remarkably, by initiating the decoding process with only one or two prior tokens, foundation LLMs can achieve performance comparable to their SFT counterparts. This method presents a cost-effective alternative to SFT and advances the democratization of multilingual LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Gpt-4 technical report. ArXiv preprint, abs/2303.08774.
  2. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
  3. Rachel Bawden and François Yvon. 2023. Investigating the translation performance of a large multilingual language model: the case of BLOOM. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 157–170. European Association for Machine Translation.
  4. CrossSum: Beyond English-centric cross-lingual summarization for 1,500+ language pairs. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2541–2564. Association for Computational Linguistics.
  5. Weak-to-strong generalization: Eliciting strong capabilities with weak supervision. ArXiv preprint, abs/2312.09390.
  6. xcot: Cross-lingual instruction tuning for cross-lingual chain-of-thought reasoning. ArXiv preprint, abs/2401.07037.
  7. Revisiting cross-lingual summarization: A corpus-based study and a new benchmark with improved annotation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9332–9351. Association for Computational Linguistics.
  8. Alebachew Chiche and Betselot Yitagesu. 2022. Part of speech tagging: a systematic review of deep learning and machine learning approaches. Journal of Big Data, 9(1):1–25.
  9. No language left behind: Scaling human-centered machine translation. ArXiv preprint, abs/2207.04672.
  10. How abilities in large language models are affected by supervised fine-tuning data composition. ArXiv preprint, abs/2310.05492.
  11. Raft: Reward ranked finetuning for generative foundation model alignment. ArXiv preprint, abs/2304.06767.
  12. A survey on in-context learning.
  13. The Flores-101 evaluation benchmark for low-resource and multilingual machine translation. Transactions of the Association for Computational Linguistics, 10:522–538.
  14. Maarten Grootendorst. 2020. Keybert: Minimal keyword extraction with bert.
  15. Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  16. Unicoder: A universal language encoder by pre-training with multiple cross-lingual tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2485–2494. Association for Computational Linguistics.
  17. Mistral 7b. ArXiv preprint, abs/2310.06825.
  18. Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71. Association for Computational Linguistics.
  19. Solomon Kullback. 1997. Information theory and statistics. Courier Corporation.
  20. Word translation without parallel data. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
  21. XGLUE: A new benchmark dataset for cross-lingual pre-training, understanding and generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6008–6018. Association for Computational Linguistics.
  22. The unlocking spell on base llms: Rethinking alignment via in-context learning. ArXiv preprint, abs/2312.01552.
  23. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81. Association for Computational Linguistics.
  24. What makes good data for alignment? a comprehensive study of automatic data selection in instruction tuning. ArXiv preprint, abs/2312.15685.
  25. An empirical study of catastrophic forgetting in large language models during continual fine-tuning. ArXiv preprint, abs/2308.08747.
  26. Christopher D Manning. 2011. Part-of-speech tagging from 97% to 100%: is it time for some linguistics? In International conference on intelligent text processing and computational linguistics, pages 171–189. Springer.
  27. Universal Dependencies. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts. Association for Computational Linguistics.
  28. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  29. Instruction tuning with gpt-4. ArXiv preprint, abs/2304.03277.
  30. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  31. Direct preference optimization: Your language model is secretly a reward model. ArXiv preprint, abs/2305.18290.
  32. COMET: A neural framework for MT evaluation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2685–2702. Association for Computational Linguistics.
  33. Cross-lingual supervision improves large language models pre-training. ArXiv preprint, abs/2305.11778.
  34. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  35. Llama 2: Open foundation and fine-tuned chat models. ArXiv preprint, abs/2307.09288.
  36. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5998–6008.
  37. Zero-shot cross-lingual summarization via large language models. In Proceedings of the 4th New Frontiers in Summarization Workshop, pages 12–23. Association for Computational Linguistics.
  38. Less: Selecting influential data for targeted instruction tuning. ArXiv preprint, abs/2402.04333.
  39. Let’s focus on neuron: Neuron-level supervised fine-tuning for large language model. CoRR, abs/2403.11621.
  40. Rrhf: Rank responses to align language models with human feedback without tears. ArXiv preprint, abs/2304.05302.
  41. Test-time adaptation for machine translation evaluation by uncertainty minimization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 807–820. Association for Computational Linguistics.
  42. Bayling: Bridging cross-lingual alignment and instruction following through interactive translation for large language models. ArXiv preprint, abs/2306.10968.
  43. Lima: Less is more for alignment. ArXiv preprint, abs/2305.11206.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Runzhe Zhan (12 papers)
  2. Xinyi Yang (33 papers)
  3. Derek F. Wong (69 papers)
  4. Lidia S. Chao (41 papers)
  5. Yue Zhang (620 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com