Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages (2404.11553v3)

Published 17 Apr 2024 in cs.CL, cs.AI, and cs.LG

Abstract: The development of LLMs relies on extensive text corpora, which are often unevenly distributed across languages. This imbalance results in LLMs performing significantly better on high-resource languages like English, German, and French, while their capabilities in low-resource languages remain inadequate. Currently, there is a lack of quantitative methods to evaluate the performance of LLMs in these low-resource languages. To address this gap, we propose the Language Ranker, an intrinsic metric designed to benchmark and rank languages based on LLM performance using internal representations. By comparing the LLM's internal representation of various languages against a baseline derived from English, we can assess the model's multilingual capabilities in a robust and language-agnostic manner. Our analysis reveals that high-resource languages exhibit higher similarity scores with English, demonstrating superior performance, while low-resource languages show lower similarity scores, underscoring the effectiveness of our metric in assessing language-specific capabilities. Besides, the experiments show that there is a strong correlation between the LLM's performance in different languages and the proportion of those languages in its pre-training corpus. These insights underscore the efficacy of the Language Ranker as a tool for evaluating LLM performance across different languages, particularly those with limited resources.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Qwen technical report. arXiv preprint arXiv:2309.16609.
  3. Systematic inequalities in language technology performance across the world’s languages. arXiv preprint arXiv:2110.06733.
  4. Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
  5. Mistral 7b. arXiv preprint arXiv:2310.06825.
  6. Ammus: A survey of transformer-based pretrained models in natural language processing. arXiv preprint arXiv:2108.05542.
  7. Transformers for low-resource languages: Is f\\\backslash\’eidir linn! arXiv preprint arXiv:2403.01985.
  8. Inference-time intervention: Eliciting truthful answers from a language model. Advances in Neural Information Processing Systems, 36.
  9. Is translation all you need? a study on solving multilingual tasks with large language models. arXiv preprint arXiv:2403.10258.
  10. OpenAI. 2023. Gpt-3 dataset statistics. https://github.com/openai/gpt-3/tree/master/dataset_statistics.
  11. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744.
  12. Language imbalance can boost cross-lingual generalisation. arXiv preprint arXiv:2404.07982.
  13. The language barrier: Dissecting safety challenges of llms in multilingual contexts. arXiv preprint arXiv:2401.13136.
  14. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295.
  15. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  16. Attention is all you need. Advances in neural information processing systems, 30.
  17. Weakly supervised scene text generation for low-resource languages. Expert Systems with Applications, 237:121622.
  18. Improving massively multilingual neural machine translation and zero-shot translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1628–1639, Online. Association for Computational Linguistics.
  19. Don’t trust chatgpt when your question is not in english: A study of multilingual abilities and types of llms. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7915–7927.
Citations (13)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com