Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Walia-LLM: Enhancing Amharic-LLaMA by Integrating Task-Specific and Generative Datasets (2402.08015v5)

Published 12 Feb 2024 in cs.CL

Abstract: LLMs have received a lot of attention in NLP research because of their exceptional performance in understanding and generating human languages. However, low-resource languages are left behind due to the unavailability of resources. In this work, we focus on enhancing the LLaMA-2-Amharic model by integrating task-specific and generative datasets to improve LLM performance for Amharic. We compile an Amharic instruction fine-tuning dataset and fine-tuned LLaMA-2-Amharic model. The fine-tuned model shows promising results in different NLP tasks. We open-source our dataset creation pipeline, instruction datasets, trained models, and evaluation outputs to promote language-specific studies on these models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Amqa: Amharic question answering dataset. arXiv preprint arXiv:2303.03290.
  2. Masakhaner: Named entity recognition for african languages. Transactions of the Association for Computational Linguistics, 9:1116–1131.
  3. Masakhanews: News topic classification for african languages. arXiv preprint arXiv:2304.09972.
  4. Findings of the 2019 conference on machine translation (WMT19). In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 1–61, Florence, Italy. Association for Computational Linguistics.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  6. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
  7. Free dolly: Introducing the world’s first truly open instruction-tuned llm.
  8. Efficient and effective text encoding for chinese llama and alpaca. arXiv preprint arXiv:2304.08177.
  9. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  10. How abilities in large language models are affected by supervised fine-tuning data composition. arXiv preprint arXiv:2310.05492.
  11. Beyond english-centric multilingual machine translation. Journal of Machine Learning Research, 22(107):1–48.
  12. Xl-sum: Large-scale multilingual abstractive summarization for 44 languages. arXiv preprint arXiv:2106.13822.
  13. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
  14. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.
  15. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  16. iocuydi. 2024. llama-2-amharic-3784m (revision 04fcac9).
  17. Phi-2: The surprising power of small language models.
  18. Mistral 7b. arXiv preprint arXiv:2310.06825.
  19. Mixtral of experts. arXiv preprint arXiv:2401.04088.
  20. KPQA: A metric for generative question answering using keyphrase weights. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2105–2115, Online. Association for Computational Linguistics.
  21. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  22. Edward Ma. 2019. Nlp augmentation. https://github.com/makcedward/nlpaug.
  23. Fine-tuning large language models for adaptive machine translation. arXiv preprint arXiv:2312.12740.
  24. Afrisenti: A twitter sentiment analysis benchmark for african languages. arXiv preprint arXiv:2302.08956.
  25. No language left behind: Scaling human-centered machine translation.
  26. Maja Popović. 2017. chrf++: words helping character n-grams. In Proceedings of the second conference on machine translation, pages 612–618.
  27. Matt Post. 2018. A call for clarity in reporting bleu scores. arXiv preprint arXiv:1804.08771.
  28. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
  29. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
  30. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  31. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
  32. Natural language processing in Ethiopian languages: Current state, challenges, and opportunities. In Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023), pages 126–139, Dubrovnik, Croatia. Association for Computational Linguistics.
  33. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  34. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
  35. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
  36. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  37. A paradigm shift in machine translation: Boosting translation performance of large language models. arXiv preprint arXiv:2309.11674.
  38. Video-llama: An instruction-tuned audio-visual language model for video understanding. arXiv preprint arXiv:2306.02858.
  39. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)