2000 character limit reached
Walia-LLM: Enhancing Amharic-LLaMA by Integrating Task-Specific and Generative Datasets (2402.08015v5)
Published 12 Feb 2024 in cs.CL
Abstract: LLMs have received a lot of attention in NLP research because of their exceptional performance in understanding and generating human languages. However, low-resource languages are left behind due to the unavailability of resources. In this work, we focus on enhancing the LLaMA-2-Amharic model by integrating task-specific and generative datasets to improve LLM performance for Amharic. We compile an Amharic instruction fine-tuning dataset and fine-tuned LLaMA-2-Amharic model. The fine-tuned model shows promising results in different NLP tasks. We open-source our dataset creation pipeline, instruction datasets, trained models, and evaluation outputs to promote language-specific studies on these models.
- Amqa: Amharic question answering dataset. arXiv preprint arXiv:2303.03290.
- Masakhaner: Named entity recognition for african languages. Transactions of the Association for Computational Linguistics, 9:1116–1131.
- Masakhanews: News topic classification for african languages. arXiv preprint arXiv:2304.09972.
- Findings of the 2019 conference on machine translation (WMT19). In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 1–61, Florence, Italy. Association for Computational Linguistics.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
- Free dolly: Introducing the world’s first truly open instruction-tuned llm.
- Efficient and effective text encoding for chinese llama and alpaca. arXiv preprint arXiv:2304.08177.
- Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
- How abilities in large language models are affected by supervised fine-tuning data composition. arXiv preprint arXiv:2310.05492.
- Beyond english-centric multilingual machine translation. Journal of Machine Learning Research, 22(107):1–48.
- Xl-sum: Large-scale multilingual abstractive summarization for 44 languages. arXiv preprint arXiv:2106.13822.
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
- Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
- iocuydi. 2024. llama-2-amharic-3784m (revision 04fcac9).
- Phi-2: The surprising power of small language models.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Mixtral of experts. arXiv preprint arXiv:2401.04088.
- KPQA: A metric for generative question answering using keyphrase weights. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2105–2115, Online. Association for Computational Linguistics.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Edward Ma. 2019. Nlp augmentation. https://github.com/makcedward/nlpaug.
- Fine-tuning large language models for adaptive machine translation. arXiv preprint arXiv:2312.12740.
- Afrisenti: A twitter sentiment analysis benchmark for african languages. arXiv preprint arXiv:2302.08956.
- No language left behind: Scaling human-centered machine translation.
- Maja Popović. 2017. chrf++: words helping character n-grams. In Proceedings of the second conference on machine translation, pages 612–618.
- Matt Post. 2018. A call for clarity in reporting bleu scores. arXiv preprint arXiv:1804.08771.
- Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
- Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
- Natural language processing in Ethiopian languages: Current state, challenges, and opportunities. In Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023), pages 126–139, Dubrovnik, Croatia. Association for Computational Linguistics.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- A paradigm shift in machine translation: Boosting translation performance of large language models. arXiv preprint arXiv:2309.11674.
- Video-llama: An instruction-tuned audio-visual language model for video understanding. arXiv preprint arXiv:2306.02858.
- Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199.
- Israel Abebe Azime (16 papers)
- Mitiku Yohannes Fuge (1 paper)
- Atnafu Lambebo Tonja (27 papers)
- Tadesse Destaw Belay (12 papers)
- Aman Kassahun Wassie (4 papers)
- Eyasu Shiferaw Jada (1 paper)
- Yonas Chanie (3 papers)
- Walelign Tewabe Sewunetie (2 papers)
- Seid Muhie Yimam (41 papers)